The command awk
(Aho, Weinberger, Kernighan) contains pattern
matching an C-like language to process text files line by line and
field by field. For example, to print the second column (columns are by
default any whitespace-separated substrings), do
awk <inputfile '{print $2}' >outputfile
.
The string passed to awk
usually consists of pattern-program
pairs. To sum the numbers in the first column, ignoring lines starting
with '#', do
awk 'BEGIN {s=0} /^#/ {next} /.*/ {s+=$1} END {print s}' <inputfile >outputfile
The special patterns BEGIN and END are associated before starting and
after ending the processing itself. The patterns are regular
expressions, see man regexp
on most systems.
The sed
is a stream editor which can substitute strings. For
example, to rename each .cpp file into .C file in the current
directory, do
for i in *.cpp # (Bourne-shell)
do
mv $i `echo $i | sed -e 's/\.cpp/.C/' | tr -d '\012'`
done
Echo
writes the current filename.cpp to stdout for further
processing by sed
and tr
. The sed
substitute command is
s/oldstring/newstring/g
, however, oldstring
is a regular
expression and thus the dot is preferrably quoted. If the /g
is given, multiple replacements are allowed (not needed in the above
example). The final character translate (tr
) command removes the
newline characters. This is perhaps not absolutely needed in most
systems because the shell may take care of it, but is a good idiom
to learn.
The above example works in Bourne-type shells (sh, ksh, bash). In C-shell (csh, tcsh) there is a simpler way, and the syntax of the for-statement is different:
foreach i (*.cpp) # (C-shell)
mv $i $i:r.C
end
The :r
``goes backward'' one dot-separated word, thus if
$i is myfile.cpp
, $i:r becomes myfile
and
$i:r.C is myfile.C
.
One can also use basename
to extract the 'type' field out of filenames.