A Unix Cook-Booklet: Manipulating files

2. Manipulating files

A Unix file is a sequence of 8-bit bytes. Files have names and they are organized in directories. In this section we show how to manipulate files without looking at their contents.

2.1 Putting files in directories

mkdir dirname makes a new directory and rmdir dirname removes it. Rmdir agrees to remove an an empty directory only. cp copies files, mv moves them and rm removes:


cp fn fnnew                    # copy a file, new name is fnnew
cp fn ../tex/fnnew             # copy, placing the copy in a different directory
cp fn1 fn2 fn3 ~/dir           # copy fn1,fn2,fn3 in dir, which is under home directory
cp -r dir1 dir2                # copy whole directory recursively
cp -r -p dir1 dir2             # preserve permissions etc.
mv fn fnnew                    # rename fn to be fnnew
mv fn1 fn2 ... dir             # move files to directory
mv dir1 dir2                   # move directory (SAME FILE SYSTEM)
mv dir1 dir2 ... dirN          # move directories (SAME FILE SYSTEM)
rm fn                          # remove a file
rm fn1 fn2                     # remove the named files
rm -r dir                      # remove directory recursively (all its contents)

Thus, if cp and mv are given more than 2 arguments, the last one must be an existing directory. Generally mv refuses to move entire directories across file systems, because this would involve physically copying them and then removing.

The cp, mv, rm commands can be given the -i option to inquire the user about any overwrite. On some systems this is the default (alias cp 'cp -i' has been done in C-shell; on Bourne-Again Shell bash, alias cp='cp -i'). The -f flag can be used to force the commands through without asking questions.

Any of the commands cp, mv, rm can remove or overwrite data permanently. Either use the -i option or think twice before hitting Return.

2.2 Symbolic links

Symbolic links are created by ln -s truefile linkname. Symbolic links can point to across file systems without restrictions, and they can point to directories as well as files. Symbolic links are handy for example if you use LaTeX to create your papers and seminar talks and want to reuse the same figures. LaTeX finds the figure files easily if they are in the current directory, and typically you keep files with a single talk in its own directory. Without symbolic links you would have to copy all the figure files in each talk directory, wasting disk space. With symbolic links, only one copy is retained, the others are just links pointing to it.

A symbolic link is removed by rm. It removes the link, not the file it points to.

Symbolic links pointing to directories are handy for ``emulating'' different directory structures, to support software that expects to find some files in a certain place which does not exist on this system.

Hard links are generated by ln without the -s option. They are not much used nowadays because they can point only to files, not directories, and not across file systems. Usually one is better off using symbolic links instead of hard links.

2.3 Tarfiles and gzipped files

To create a tarfile of a directory, do


tar cf tarfile.tar dir             # tar create file

All the files in dir are recursively put in the tarfile. To extract the contents of an existing tarfile in the current directory, do


tar xf tarfile.tar                 # tar extract file

To view the contents of a tarfile without actually extraing it, do


tar tvf tarfile.tar                # see contents only

Tar is also used to copy directory recursively. One could also do this by cp -r (see cp -rabove), but then the permissions, modification times etc. are reset to new values (though cp -p should prevent this, but it may not prevent it in every aspect). Anyway, tar is a foolproof way to make an exact copy of a directory in every aspect:


cd                                                # cd to home directory
cd ..                                             # go one level up
tar cf - pjanhune | (cd /work/pjanhune; tar xf -)

This copies the directory pjanhune (the home directory of user pjanhune) to a new place, under /work/pjanhune. It thus creates /work/pjanhune/pjanhune. The - stands for standard output in the first tar: the tarfile is written to stdout. The stdout is piped to the subshell command in parentheses. This subshell first changes to /work/pjanhune, then unpacks the tarfile read from stdin. The result is a copy of directory without using temporary disk space.

The option v makes tar verbose, it lists what it is doing. Sometimes the B option may be necessary when tar is reading from a pipe (on IRIX system at least); it may also help when reading from a tape tarfile written using a different system.

The program gzip gzips (compresses) a file:


gzip file               # compress file, producing file.gz (and delete old)
gzip -9 file            # try heavily, use longer time
gunzip file.gz          # uncompress file.gz, produce file (and delete file.gz)
gunzip <file >newfile   # uncompress from stdin to stdout, not deleting anything

Gzipped tarfiles are common and typically end with .tar.gz. A shorter convention .tgz is common in Linux world. The GNU/Linux version of tar can do the gzip transparently by using the z option:


tar czf tarfile.tgz dir    # create a gzipped tarfile
tar xzf tarfile.tgz        # extract from a gzipped tarfile

2.4 Backups and such

To make a backup on a local tape device, one normally uses


cd ~/..                   # cd to one level up from home dir
tar cf /dev/tape pjanhune

The tape device name may vary, it might also be /dev/rmt0, for instance. You need write permission to it, which is typically done by adding yourself to group ``tape'' or ``disk'' in /etc/group, or by giving enough permissions to everyone for /dev/tape: chmod uog+rw /dev/tape.

A backup is restored using tar xf. Be extremely careful NOT to use tar xf inadvertently. If you have a tape in your drive, you could easily anti-backup your files (i.e., lose your recent changes) by doing this! Remember that c=Create New Tarfile, x=eXtract Old Tarfile.

If the tape drive exists on a different machine, you can do one of the following:


(do cd ~/.. first)
tar cf - pjanhune | ssh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf - pjanhune | rsh -l pjanhune rem.fmi.fi 'dd of=/dev/tape'
tar cf rem.fmi.fi:/dev/tape pjanhune

In the first case the tarfile is piped to a Secure Shell command, which logs in as pjanhune on rem.fmi.fi, invokes the Direct Data transfer (dd) on rem.fmi.fi, writing output file to /dev/tape. In the second case one uses the old rsh (remote shell) instead of ssh. Rsh does not encrypt the traffic. In the third case one relies on the rmt (remote magnetic tape) daemon running on rem.fmi.fi.

One may have to set the buffer size for tar. See man tar to find out how. One may also have to set the buffer size for dd using bs=8192 or something similar.

On same systems, writing to tape using dd does not work properly (then probably neither does cat) but one has to use tar. If also the rmt daemon is not running, probably for security reasons, then one has to use NFS (network file system) to mount the directories to rem.fmi.fi before the backup can succeed. Using NFS for backups probably uses more network resources than using the other methods.

It is not usually a good idea to use gzip with tape backups. If a single bit error occurs, it can corrupt the whole backup. Plain tar format is more tolerant to errors, in the best case only one file is damaged by a tape bit error.

2.5 Remote copying

To copy files or directories from machine to machine one can use ssh, scp (or the old, less safe rsh, rcp). To use scp, the sshd daemon must be running on the remote machine. sshd typically resides in /usr/local/sbin or /usr/sbin, use locate or find to find out. If typing the password each time is no bother, this is all that is needed. For example,


scp -r dir rem.fmi.fi:/tmp/pjanhune    # copy dir to rem.fmi.fi
scp -r dir pj@rem:/tmp/pj              # log in as user 'pj'

If too lazy to type the password each time, one has to add the local machine's ~/.ssh/public_key to the remote machine's ~/.ssh/authorized_keys file. Or, one can add the local machine's name and username in the remote machine's ~/.shosts file, which is less safe. One can also add it to ~/.rhosts, which enables rcp to work and is even less safe. Check man ssh for details.

2.6 File systems

Directories are divided in file systems, which usually correspond to physical disk units, but nowadays may also be completely virtual. To see the file systems use df -k or mount.

The command mount is used to mount local and remote (NFS) file systems. The configuration file is /etc/fstab. mount -a examines /etc/fstab and tries to mount everything defined there. umount is used to unmount file systems. For example, a CD-ROM must be mounted/umounted this way.

2.7 Permissions and ownership

A Unix file and directory has read, write and execute (rwx) permission bits for user (u), group (g) and others (o). To publish your ~/public_htmldirectory for the Web server or other people to see, do


cd
chmod -R og+r public_html

The -R means recursive, and others (o) and group (g) are given r access to everything under public_html. This is not enough, you must also add execute permission to every subdirectory:


find public_html -type d -exec chmod og+x {} ";"

If you do chmod -R og+rx public_html, the execute permission is also added to every file, which is not dangerous but a bit ugly.

If you are the root, you can change the owner and group of files using chown and chgrp. Modern versions of chown allow to set the group simultaneously, like


chown -R pjanhune.luser .

changes all files owner to be pjanhune and group to by luser (``local user'') under current directory and all its subdirectories.

Next Previous Contents