The command awk (Aho, Weinberger, Kernighan) contains pattern
matching an C-like language to process text files line by line and
field by field. For example, to print the second column (columns are by
default any whitespace-separated substrings), do
awk <inputfile '{print $2}' >outputfile
awk usually consists of pattern-program
pairs. To sum the numbers in the first column, ignoring lines starting
with '#', do
awk 'BEGIN {s=0} /^#/ {next} /.*/ {s+=$1} END {print s}' <inputfile >outputfile
man regexp on most systems.
The sed is a stream editor which can substitute strings. For
example, to rename each .cpp file into .C file in the current
directory, do
for i in *.cpp                # (Bourne-shell)
do
mv $i `echo $i | sed -e 's/\.cpp/.C/' | tr -d '\012'`
done
Echo writes the current filename.cpp to stdout for further
processing by sed and tr. The sed substitute command is
s/oldstring/newstring/g, however, oldstring is a regular
expression and thus the dot is preferrably quoted. If the /g
is given, multiple replacements are allowed (not needed in the above
example). The final character translate (tr) command removes the
newline characters. This is perhaps not absolutely needed in most
systems because the shell may take care of it, but is a good idiom
to learn.
The above example works in Bourne-type shells (sh, ksh, bash). In C-shell (csh, tcsh) there is a simpler way, and the syntax of the for-statement is different:
foreach i (*.cpp)             # (C-shell)
mv $i $i:r.C
end
:r ``goes backward'' one dot-separated word, thus if
$i is myfile.cpp, $i:r becomes myfile and
$i:r.C is myfile.C.
One can also use basename to extract the 'type' field out of filenames.