A Unix file is a sequence of 8-bit bytes. Files have names and they are organized in directories. In this section we show how to manipulate files without looking at their contents.
mkdir dirname makes a new directory and rmdir dirname
removes it. Rmdir agrees to remove an an empty directory
only. cp copies files, mv moves them and rm removes:
cp fn fnnew # copy a file, new name is fnnew
cp fn ../tex/fnnew # copy, placing the copy in a different directory
cp fn1 fn2 fn3 ~/dir # copy fn1,fn2,fn3 in dir, which is under home directory
cp -r dir1 dir2 # copy whole directory recursively
cp -r -p dir1 dir2 # preserve permissions etc.
mv fn fnnew # rename fn to be fnnew
mv fn1 fn2 ... dir # move files to directory
mv dir1 dir2 # move directory (SAME FILE SYSTEM)
mv dir1 dir2 ... dirN # move directories (SAME FILE SYSTEM)
rm fn # remove a file
rm fn1 fn2 # remove the named files
rm -r dir # remove directory recursively (all its contents)
Thus, if cp and mv are given more than 2 arguments, the last
one must be an existing directory. Generally mv refuses to move
entire directories across file systems, because this would involve
physically copying them and then removing.
The cp, mv, rm commands can be given the -i option to
inquire the user about any overwrite. On some systems this is the
default (alias cp 'cp -i' has been done in C-shell; on Bourne-Again Shell
bash, alias cp='cp -i'). The -f flag can be used to
force the commands through without asking questions.
Any of the commands cp, mv, rm can remove or overwrite data
permanently. Either use the -i option or think twice before
hitting Return.
Symbolic links are created by ln -s truefile linkname. Symbolic
links can point to across file systems without restrictions, and they
can point to directories as well as files. Symbolic links are handy
for example if you use LaTeX to create your papers and seminar talks
and want to reuse the same figures. LaTeX finds the figure files
easily if they are in the current directory, and typically you keep
files with a single talk in its own directory. Without symbolic links
you would have to copy all the figure files in each talk directory,
wasting disk space. With symbolic links, only one copy is retained,
the others are just links pointing to it.
A symbolic link is removed by rm. It removes the link, not the
file it points to.
Symbolic links pointing to directories are handy for ``emulating'' different directory structures, to support software that expects to find some files in a certain place which does not exist on this system.
Hard links are generated by ln without the -s option. They
are not much used nowadays because they can point only to files, not
directories, and not across file systems. Usually one is better off using
symbolic links instead of hard links.
To create a tarfile of a directory, do
tar cf tarfile.tar dir # tar create file
All the files in dir are recursively put in the tarfile. To
extract the contents of an existing tarfile in the current directory,
do
tar xf tarfile.tar # tar extract file
To view the contents of a tarfile without actually extraing it, do
tar tvf tarfile.tar # see contents only
Tar is also used to copy directory recursively. One could also do
this by cp -r (see
cp -rabove), but then the permissions, modification
times etc. are reset to new values (though cp -p should prevent
this, but it may not prevent it in every aspect). Anyway, tar is
a foolproof way to make an exact copy of a directory in every aspect:
cd # cd to home directory
cd .. # go one level up
tar cf - pjanhune | (cd /work/pjanhune; tar xf -)
This copies the directory pjanhune (the home directory of user
pjanhune) to a new place, under /work/pjanhune. It thus
creates /work/pjanhune/pjanhune. The - stands for
standard output in the first tar: the tarfile is written to
stdout. The stdout is piped to the subshell command in
parentheses. This subshell first changes to /work/pjanhune,
then unpacks the tarfile read from stdin. The result is a copy of
directory without using temporary disk space.
The option v makes tar verbose, it lists what it is
doing. Sometimes the B option may be necessary when tar is
reading from a pipe (on IRIX system at least); it may also help when
reading from a tape tarfile written using a different system.
The program gzip gzips (compresses) a file:
gzip file # compress file, producing file.gz (and delete old)
gzip -9 file # try heavily, use longer time
gunzip file.gz # uncompress file.gz, produce file (and delete file.gz)
gunzip <file >newfile # uncompress from stdin to stdout, not deleting anything
Gzipped tarfiles are common and typically end with .tar.gz. A
shorter convention .tgz is common in Linux world. The GNU/Linux
version of tar can do the gzip transparently by using the z
option:
tar czf tarfile.tgz dir # create a gzipped tarfile
tar xzf tarfile.tgz # extract from a gzipped tarfile
To make a backup on a local tape device, one normally uses
cd ~/.. # cd to one level up from home dir
tar cf /dev/tape pjanhune
The tape device name may vary, it might also be /dev/rmt0,
for instance. You need write permission to it, which is typically done
by adding yourself to group ``tape'' or ``disk'' in
/etc/group, or by giving enough permissions to everyone for
/dev/tape: chmod uog+rw /dev/tape.
A backup is restored using tar xf. Be extremely careful NOT to
use tar xf inadvertently. If you have a tape in your drive, you
could easily anti-backup your files (i.e., lose your recent changes)
by doing this! Remember that c=Create New Tarfile, x=eXtract
Old Tarfile.
If the tape drive exists on a different machine, you can do one of the following:
(do cd ~/.. first)
tar cf - pjanhune | ssh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf - pjanhune | rsh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf rem.fmi.fi:/dev/tape pjanhune
In the first case the tarfile is piped to a Secure Shell command,
which logs in as pjanhune on rem.fmi.fi, invokes the Direct Data
transfer (dd) on rem.fmi.fi, writing output file to
/dev/tape. In the second case one uses the old rsh
(remote shell) instead of ssh. Rsh does not encrypt the
traffic. In the third case one relies on the rmt
(remote magnetic tape) daemon running on rem.fmi.fi.
One may have to set the buffer size for tar. See man tar to
find out how.
One may also have to set the buffer size for dd using
bs=8192 or something similar.
On same systems, writing to tape using dd does not work properly
(then probably neither does cat) but one has to use tar. If
also the rmt daemon is not running, probably for security
reasons, then one has to use NFS (network file system) to mount
the directories to rem.fmi.fi before the backup can succeed. Using NFS
for backups probably uses more network resources than using the other methods.
It is not usually a good idea to use gzip with tape backups. If a single
bit error occurs, it can corrupt the whole backup. Plain tar
format is more tolerant to errors, in the best case only one file is
damaged by a tape bit error.
To copy files or directories from machine to machine one can use
ssh, scp (or the old, less safe rsh, rcp). To use scp,
the sshd daemon must be running on the remote machine. sshd
typically resides in /usr/local/sbin or /usr/sbin,
use locate or find to find out. If typing the password each
time is no bother, this is all that is needed. For example,
scp -r dir rem.fmi.fi:/tmp/pjanhune # copy dir to rem.fmi.fi
scp -r dir pj@rem:/tmp/pj # log in as user 'pj'
If too lazy to type the password each time, one has to add the
local machine's ~/.ssh/public_key to the remote machine's
~/.ssh/authorized_keys file. Or, one can add the local
machine's name and username in the remote machine's ~/.shosts
file, which is less safe. One can also add it to ~/.rhosts,
which enables rcp to work and is even less safe. Check man ssh for details.
Directories are divided in file systems, which usually correspond to
physical disk units, but nowadays may also be completely virtual. To
see the file systems use df -k or mount.
The command mount is used to mount local and remote (NFS) file
systems. The configuration file is /etc/fstab. mount
-a examines /etc/fstab and tries to mount everything
defined there. umount is used to unmount file systems. For
example, a CD-ROM must be mounted/umounted this way.
A Unix file and directory has read, write and execute (rwx) permission bits
for user (u), group (g) and others (o). To publish your ~/public_htmldirectory for
the Web server or other people to see, do
cd
chmod -R og+r public_html
The -R means recursive, and others (o) and group (g) are given r
access to everything under public_html. This is not enough, you
must also add execute permission to every subdirectory:
find public_html -type d -exec chmod og+x {} ";"
If you do chmod -R og+rx public_html, the execute permission is
also added to every file, which is not dangerous but a bit ugly.
If you are the root, you can change the owner and group of files using
chown and chgrp. Modern versions of chown allow to set
the group simultaneously, like
chown -R pjanhune.luser .
changes all files owner to be pjanhune and group to by luser (``local
user'') under current directory and all its subdirectories.