A Unix file is a sequence of 8-bit bytes. Files have names and they are organized in directories. In this section we show how to manipulate files without looking at their contents.
mkdir dirname
makes a new directory and rmdir dirname
removes it. Rmdir
agrees to remove an an empty directory
only. cp
copies files, mv
moves them and rm
removes:
cp fn fnnew # copy a file, new name is fnnew
cp fn ../tex/fnnew # copy, placing the copy in a different directory
cp fn1 fn2 fn3 ~/dir # copy fn1,fn2,fn3 in dir, which is under home directory
cp -r dir1 dir2 # copy whole directory recursively
cp -r -p dir1 dir2 # preserve permissions etc.
mv fn fnnew # rename fn to be fnnew
mv fn1 fn2 ... dir # move files to directory
mv dir1 dir2 # move directory (SAME FILE SYSTEM)
mv dir1 dir2 ... dirN # move directories (SAME FILE SYSTEM)
rm fn # remove a file
rm fn1 fn2 # remove the named files
rm -r dir # remove directory recursively (all its contents)
Thus, if cp
and mv
are given more than 2 arguments, the last
one must be an existing directory. Generally mv
refuses to move
entire directories across file systems, because this would involve
physically copying them and then removing.
The cp, mv, rm
commands can be given the -i
option to
inquire the user about any overwrite. On some systems this is the
default (alias cp 'cp -i'
has been done in C-shell; on Bourne-Again Shell
bash
, alias cp='cp -i'
). The -f
flag can be used to
force the commands through without asking questions.
Any of the commands cp, mv, rm
can remove or overwrite data
permanently. Either use the -i
option or think twice before
hitting Return.
Symbolic links are created by ln -s truefile linkname
. Symbolic
links can point to across file systems without restrictions, and they
can point to directories as well as files. Symbolic links are handy
for example if you use LaTeX to create your papers and seminar talks
and want to reuse the same figures. LaTeX finds the figure files
easily if they are in the current directory, and typically you keep
files with a single talk in its own directory. Without symbolic links
you would have to copy all the figure files in each talk directory,
wasting disk space. With symbolic links, only one copy is retained,
the others are just links pointing to it.
A symbolic link is removed by rm
. It removes the link, not the
file it points to.
Symbolic links pointing to directories are handy for ``emulating'' different directory structures, to support software that expects to find some files in a certain place which does not exist on this system.
Hard links are generated by ln
without the -s
option. They
are not much used nowadays because they can point only to files, not
directories, and not across file systems. Usually one is better off using
symbolic links instead of hard links.
To create a tarfile of a directory, do
tar cf tarfile.tar dir # tar create file
All the files in dir
are recursively put in the tarfile. To
extract the contents of an existing tarfile in the current directory,
do
tar xf tarfile.tar # tar extract file
To view the contents of a tarfile without actually extraing it, do
tar tvf tarfile.tar # see contents only
Tar
is also used to copy directory recursively. One could also do
this by cp -r
(see
cp -rabove), but then the permissions, modification
times etc. are reset to new values (though cp -p
should prevent
this, but it may not prevent it in every aspect). Anyway, tar
is
a foolproof way to make an exact copy of a directory in every aspect:
cd # cd to home directory
cd .. # go one level up
tar cf - pjanhune | (cd /work/pjanhune; tar xf -)
This copies the directory pjanhune
(the home directory of user
pjanhune) to a new place, under /work/pjanhune
. It thus
creates /work/pjanhune/pjanhune
. The -
stands for
standard output in the first tar
: the tarfile is written to
stdout. The stdout is piped to the subshell command in
parentheses. This subshell first changes to /work/pjanhune
,
then unpacks the tarfile read from stdin. The result is a copy of
directory without using temporary disk space.
The option v
makes tar
verbose, it lists what it is
doing. Sometimes the B
option may be necessary when tar
is
reading from a pipe (on IRIX system at least); it may also help when
reading from a tape tarfile written using a different system.
The program gzip
gzips (compresses) a file:
gzip file # compress file, producing file.gz (and delete old)
gzip -9 file # try heavily, use longer time
gunzip file.gz # uncompress file.gz, produce file (and delete file.gz)
gunzip <file >newfile # uncompress from stdin to stdout, not deleting anything
Gzipped tarfiles are common and typically end with .tar.gz
. A
shorter convention .tgz
is common in Linux world. The GNU/Linux
version of tar
can do the gzip transparently by using the z
option:
tar czf tarfile.tgz dir # create a gzipped tarfile
tar xzf tarfile.tgz # extract from a gzipped tarfile
To make a backup on a local tape device, one normally uses
cd ~/.. # cd to one level up from home dir
tar cf /dev/tape pjanhune
The tape device name may vary, it might also be /dev/rmt0
,
for instance. You need write permission to it, which is typically done
by adding yourself to group ``tape'' or ``disk'' in
/etc/group
, or by giving enough permissions to everyone for
/dev/tape
: chmod uog+rw /dev/tape
.
A backup is restored using tar xf
. Be extremely careful NOT to
use tar xf
inadvertently. If you have a tape in your drive, you
could easily anti-backup your files (i.e., lose your recent changes)
by doing this! Remember that c=Create New Tarfile
, x=eXtract
Old Tarfile
.
If the tape drive exists on a different machine, you can do one of the following:
(do cd ~/.. first)
tar cf - pjanhune | ssh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf - pjanhune | rsh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf rem.fmi.fi:/dev/tape pjanhune
In the first case the tarfile is piped to a Secure Shell command,
which logs in as pjanhune on rem.fmi.fi, invokes the Direct Data
transfer (dd
) on rem.fmi.fi, writing output file to
/dev/tape
. In the second case one uses the old rsh
(remote shell) instead of ssh
. Rsh
does not encrypt the
traffic. In the third case one relies on the rmt
(remote magnetic tape) daemon running on rem.fmi.fi.
One may have to set the buffer size for tar
. See man tar
to
find out how.
One may also have to set the buffer size for dd
using
bs=8192
or something similar.
On same systems, writing to tape using dd
does not work properly
(then probably neither does cat
) but one has to use tar
. If
also the rmt
daemon is not running, probably for security
reasons, then one has to use NFS (network file system) to mount
the directories to rem.fmi.fi before the backup can succeed. Using NFS
for backups probably uses more network resources than using the other methods.
It is not usually a good idea to use gzip with tape backups. If a single
bit error occurs, it can corrupt the whole backup. Plain tar
format is more tolerant to errors, in the best case only one file is
damaged by a tape bit error.
To copy files or directories from machine to machine one can use
ssh, scp
(or the old, less safe rsh, rcp
). To use scp
,
the sshd
daemon must be running on the remote machine. sshd
typically resides in /usr/local/sbin
or /usr/sbin
,
use locate
or find
to find out. If typing the password each
time is no bother, this is all that is needed. For example,
scp -r dir rem.fmi.fi:/tmp/pjanhune # copy dir to rem.fmi.fi
scp -r dir pj@rem:/tmp/pj # log in as user 'pj'
If too lazy to type the password each time, one has to add the
local machine's ~/.ssh/public_key
to the remote machine's
~/.ssh/authorized_keys
file. Or, one can add the local
machine's name and username in the remote machine's ~/.shosts
file, which is less safe. One can also add it to ~/.rhosts
,
which enables rcp
to work and is even less safe. Check man ssh
for details.
Directories are divided in file systems, which usually correspond to
physical disk units, but nowadays may also be completely virtual. To
see the file systems use df -k
or mount
.
The command mount
is used to mount local and remote (NFS) file
systems. The configuration file is /etc/fstab
. mount
-a
examines /etc/fstab
and tries to mount everything
defined there. umount
is used to unmount file systems. For
example, a CD-ROM must be mounted/umounted this way.
A Unix file and directory has read, write and execute (rwx) permission bits
for user (u), group (g) and others (o). To publish your ~/public_html
directory for
the Web server or other people to see, do
cd
chmod -R og+r public_html
The -R
means recursive, and others (o) and group (g) are given r
access to everything under public_html
. This is not enough, you
must also add execute permission to every subdirectory:
find public_html -type d -exec chmod og+x {} ";"
If you do chmod -R og+rx public_html
, the execute permission is
also added to every file, which is not dangerous but a bit ugly.
If you are the root, you can change the owner and group of files using
chown
and chgrp
. Modern versions of chown
allow to set
the group simultaneously, like
chown -R pjanhune.luser .
changes all files owner to be pjanhune and group to by luser (``local
user'') under current directory and all its subdirectories.