- Some filters can have file arguments:
- string1 and string2 normally have the same number of characters.
Can have a run of characters using special (limited) syntax with [-] characters:
tr [a-z] [A-Z]
Unfortunately [] causes wildcard treatment for a filename to the shells, which is not intended here - Must quote the arguments (the quotes are not passed by the shell).
tr '[a-z]' '[A-Z]'
translates lower-case letters to upper-case.
- Can use *, meaning times number required, if translating to the same character, eg 0 in the following:
tr '[a-z]' '[0*]'
- To delete characters, eg all alphabetic
tr -d '[a-zA-Z]'
- The complement can be specified by -c and multiple replacements can be squashed into a single replacement with -s.
Control characters can be specified as octal numbers of the form \012.
- Eg to get all words in lower case on a separate line (ie separated by a newline or \012 character) while deleting any other character(!:-)
tr '[A-Z]' '[a-z]' | tr -cs '[a-z]' '[\012*]'
- As a filter is used to select
lines from stdin that contain search pattern and copy to
stdout; other lines are discarded.
- Egs
grep found
displays
lines containing found.
grep
'[Pp]hrase of interest'
displays
lines containing phrase of interest or Phrase
of interest. Quotes are needed for
the shell
so spaces and [] are passed on.
- As program reading its file arguments is used to find files and/or show lines containing search pattern.
grep
'^function' file1.c file2.c
names file
and displays lines with function at beginning of line.
- grep returns 0 for success, 1 for failure, 2 for bad pattern. ($ echo $?)
- grep has several options:
-i ignore case
-l list file names only
-c count only of occurrences
-n number lines
-v output lines that don't match pattern
-e pattern can start with - or multiple patterns
- Eg to find processes run by user fred in BSD and SystemV respectively:
ps aux | grep fred | grep -v grep
ps -ef | grep fred | grep -v grep
Eg to edit C source files containing Fred:
vi `grep -l Fred *.c`
There are also fgrep
(fast grep with no regular expressions) and egrep
(extended regular expressions).
- As a filter,
prog
file | head -100 | filter2
reads only
100 lines from stdin and writes them to stdout.
- As a program with one file argument, it can drive a pipeline from the start of an input file:
head
-1000 largeFile | filter1 | filter2
- tail displays lines through
to the end of a file.
- Commonly used to look at the end of a file, 30 lines in following:
tail -30 file
- Also used to start displaying lines after skipping lines. Eg to display from line 100000 to end:
tail +100000 file
- Works as filter. Eg start from line 2000 and use head to stop after next 1000:
prog file | tail +2000 | head -1000 | filter
- sort as a filter:
prog
file1 file2 | sort | filter2
- sort as program driving a pipeline:
sort file1 file2 | filter1 | filter2
- By default, each line is sorted
using Ascii collation sequence (or order)
-Display Ascii order by man ascii.
-sort treats lines as records separated by newline characters.
- In following examples, the input file displayed in the left column(s) produces the output shown in the right column(s). Eg sort sAscii
sAscii: output:
abcc a aa b
ab a aabc 9
aabc 9 ab a
c d abcc
a
ba 3 abcc e
abcc e ba 3
aa b c d
- Can reverse the sort order,
ie descending order, by
sort
-r sAscii
sAscii: output:
abcc a c d
ab a
abcc e
aabc 9 abcc a
c d
ab a
ba 3
aabc 9
abcc e aa b
aa b
ba 3
- For sorting numbers, the default Ascii order is inappropriate.
Eg sort sNumbers
sNumbers: output:
11 1
1 23
21 05
05 10
10 11
23 2
2 21
Notice space is before the digits (and the alphabetic characters)in Ascii colation sequence. Also integer numbers are being treated as strings (so 2 was placed after 10!).
To sort numbers, an option is given, eg sort -n sNumbers
Numbers: output:
11
1
1
2
21
4
05
05
4
10
10
11
23
21
2
23
- Fields in sort are separated
by space and/or Tab characters.
-Following has 5 fields, numbered 1, 2, 3, 4 and 5:
a*b->c**->d->->e
where * represents a space and -> represents a Tab character.
- Sorting can require the use of certain fields before others, ie specifying the sort key order.
- Sort keys
are usually specified by (skip,finish) pairs in the following format +skip
-finish where skip is the number of fields
to skip across and finish is the inclusive
field to finish on:
+1 -3 skip field 1 (so start on field 2) and
finish on field 3
+5 -6 skip field 5 (so start on field 6) and
finish on field 6
+0 -2 skip field 0 (so start on field 1) and
finish on field 2
But can imply finishing on last field:
+4 for 5 fields is same as +4 -5
- Eg for 5 fields, could specify
sort key order of fields 2 and 3, then field 1, then fields 4 and 5 by
the pairs
sort +1 -3 +0 -1 +3 -5
- Can have more complicated orderings.
Eg for 5 fields again, could specify sort key order of
field 2 as numbers, then fields
4 and 5 as Ascii (default) but in reverse order, then field 1 as numbers
in reverse order, then field 3 by either of
sort
+1 -2n +3 -5r +0 -1nr +2 -3
sort +1n -2 +3r -5 +0nr -1 +2 -3
- sort +1 -2nr +0 -1 sAscii produces:
sAscii: output:
abcc a aabc 9
ab a ba 3
aabc 9 aa b
c d ab
a
ba 3 abcc a
abcc e abcc e
aa b c d
Notice the first field was needed to resolve the order when the second field did not contain numbers (and were taken as 0).
- Changing the tabulation character
(from space or Tab):
-t: changes tabulation character to :
- Ignoring case (or folding):
-f fold upper case to lower case
- Merging pre-sorted files is liniear in time complexity and hence quicker than sorting all the files together. sort -m file1 file2 merges the pre-sorted files.
- It is sometimes convenient
to specify an output file
-o outFile
- Subfields may be specified
using dot notation on the fields. Eg
+2.4 -3.7 skip 2 fields and 4 characters of
field 3; finish on 7th character of field 3
- Note, you
can treat a line as one field by specifying a tabulation character that
is
never used
on any line; then threat as characters in the field.
- To discard duplicated lines
after writing the first occurrence
-u
unique output
- With no option, one copy of
each unique line is printed to stdout.
- Duplicated lines are discarded.
- With -u
only those lines which have no duplicates are printed.
- With -d
only those lines which have duplicates are printed.
Egs uniq dups, uniq -u dups and uniq -d dups produce following
dups: default output: -u output: -d output:
a
a
a
b
b
b
c
ddd
b
c
dd
c
dd
dd
ddd
ddd
ddd
ddd
uniq runs as filter or program driving a pipeline.
There are options to compare
fields and to count duplicated lines.
- cut
prints selected columns or fields to stdout.
- cut
is available on SystemV and as GNU public domain program.
- cut
runs as filter or program driving a pipeline.
- For columns, use -clist where
list is comma separated and/or ranges. Eg to output column 7, then columns
3 to 6 and finally column 11 onwards:
cut -c7,3-6,11-
- Fields are separated by the
Tab character. For fields, use -flist
where list is comma separated and/or ranges.
Eg to output fields 3, 7 and 1:
cut -f3,7,1
- To change the delimiting (or
tabulation) character use -dc,
where c is the required character. Eg -d'
' to set to space and -d: to set to
:.
- paste
puts files into columns, printing to stdout.
- paste is available on SystemV and as GNU public domain program.
- To put multiple files into side-by-side columns, use for example paste pfile1 pfile2 to produce
pfile1: pfile2: output:
a line 1 b line 1
a line 1 b line 1
a line 2 b line 2
a line 2 b line 2
a line 3
a line 3
- By default a Tab character is placed between the columns for tabulation.
- To change the delimiting (or tabulation) character use -dc, where c is the required character. Eg paste -d: pfile1 pfile2 produces
pfile1: pfile2: output:
a line 1 b line 1
a line 1:b line 1
a line 2 b line 2
a line 2:b line 2
a line 3
a line 3:
- To squash lines of a file into columns, use for example paste -s -d:, pfile1 to squash three lines to a line with a : and a , between each respectively to produce
pfile1: output:
a line 1 a line 1:a line 2,a line 3
a line 2
a line 3
- cut and paste can be used to
reorder columns. Eg from file with columns
col1
col2 col3 to get
col2 col1
col3:
cut
-f2 file > col2
cut -f1,3 file > cols1+3
paste col2 cols1+3 > newfile
This can also be done as follows
cut
-f2 file > col2
cut -f1,3 file | paste col2 - > newfile
where - as a filename means stdin.
(This is used in many other filters too.)