Sed is commonly used to search for and replace sub strings in text files, delete or extract lines from text files.
SED is a Stream EDitor. SED works as a "flow" mode : input stream is processed line by line. This ensures good performance and a reduced memory use but prevents SED to have an overview of the entire file.
SED Processing Steps :
Note: MAN page
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
sed receives a script which contains all actions to be performed on the input stream.
There are two ways to forward this script to the input stream :
From the sed command line : with '-e' option, each action from the script may be separated with semicolons and directly typed in the command line sed -e action1; action2; action3
or each action should be preceded with -e. You could either write sed -e action1 -e action2 -e action3
From an external file (eg myscript.sed) containing the script with "-f option" sed -f script-file
. This way, commands are read from a file. This ensures better readability for large scripts, and allows script reuse.
Two choices for SED output stream :
The first method consists in applying the command to an input stream, and in redirecting result lines to an output stream. As an example, sed may be applied to an input file, and its output may be redirected to another file.
The second method is the "direct" method, with "-i" option : sed -i
applies the command directly on the input file and modify it.
Let's create a simple file to test sed commands and display results in a terminal window :
cat > hello.txt <<- EOF
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
EOF
This file is saved as hello.txt. To display this file, we can use the cat
command :
cat hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
The d command is used to delete selected lines. As sed works on a data stream, it is not a real suppression, sed just jumps to the next line.
Let's delete line 3 for example :
sed '3d' hello.txt
This is the first line.
Line 2 ? argh
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
It worked ! Line 3 was deleted.
Let's delete lines from 2 to 4 :
sed '2,3d' hello.txt
This is the first line.
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
We jumped after line 1 directly to line 4.
The
-e
option allows several commands to be executed in sequence.
sed -e '1d' -e '3d' hello.txt
is the same as
sed -e '1d; 3d' hello.txt
Line 2 ? argh
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
You may notice line 1 and 3 were deleted.
sed '/comment/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
sed '4,$ d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
!
Let's delete lines other than a specified range, line other than 2nd till 4th. The symbol !
indicates negative condition :
sed '2,4!d' hello.txt
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
In Regular Expressions, the symbol ^
means beginning of a line, and $
means end of a line. It is thus obvious that the pattern ^$
stands for an empty line. Patterns are indicated between slashes Characters.
sed '/^$/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
sed '/^Line/ d' hello.txt
This is the first line.
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
sed '/[ih]$/d' hello.txt
This is the first line.
Line 2 ? argh
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
[ih] indicates either 'i' or 'h'. So, this will delete all lines ending with either 'i' or 'h'.
Rules to select lines from an interval (two patterns which are separated by commas) :
sed '/^Line/,/characters\.$/!d' hello.txt
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
First we'll ask sed to apply the command p (print) to the lines of the input file without any filtering.
sed -e 'p' hello.txt
This is the first line.
This is the first line.
Line 2 ? argh
Line 2 ? argh
Line 3 ? hi
Line 3 ? hi
Line 4 starts with blank characters.
Line 4 starts with blank characters.
Line 6 # This is a comment.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
# This is the last comment starting with blank characters.
We can notice all lines are duplicated. Let's understand why : Sed displays by default the resulting line on the output standard unless it is invoked with the -n option. On the other hand, with 'p command' sed is also explicitely asked to display the resulting line. This leads to a duplication of the resulting line.
We will try again with -n option
sed -n -e 'p' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
Lines duplication is no more effective.
The 'p command' can be used symmetrically to the 'd command'
To pick up all lines containing a word for example : let's try again with the word "comment"
sed -n '/comment/p' hello.txt
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
The negative form of "d command" will produce the same result :
sed '/comment/!d' hello.txt
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
p is mainly used for two reasons :
Sed has several commands, but the substitute command is the most used one because d and p commands may be replaced with other tools as grep, head, tail or tr.
The substitute command replaces patterns from an input stream (a text file for example) into a new value. This pattern may be a regular expression.
s / pattern / replacement /
By default, it takes place on the first occurrence of the pattern in the line, unless the option g is added at the end of the command :
s / pattern / replacement / g
One can also choose to replace the third occurence :
s / pattern / replacement / 3
Let's create a test file
cat > testS.txt <<- EOF
It the sky we look upon now now
Should tumble and fall
All of the mountains may crumble May crumble to the sea
EOF
sed s/crumble/fall/ testS.txt
It the sky we look upon now now
Should tumble and fall
All of the mountains may fall May crumble to the sea
Let's add g option to replace all ocurences in the line:
sed s/crumble/fall/g testS.txt
It the sky we look upon now now
Should tumble and fall
All of the mountains may fall May fall to the sea
And to only replace the second occurence in the line :
sed s/crumble/fall/2 testS.txt
It the sky we look upon now now
Should tumble and fall
All of the mountains may crumble May fall to the sea
This is our previous text :
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
We need two commands and patterns using Regular Expressions :
-e 's/#.*/'
replaces any comments by an empty string : this command removes any characters from the "#" character to the end of the line. The symbol .
means any character and the symbol *
means 0 or more of "any" characters.-e '/^$/ d'
Let's see the result :
sed -e 's/#.*//' -e '/^$/ d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6
To remove all space and tab characters starting or ending a line, we need two more substitute commands :
-e 's/^[ ]*//'
-e 's/[ ]*$//'
You would have to explicitly type a space character and a tab character inside the brackets ['Space key''Tab key']
[ ]
indicates either 'space character' or 'tab character'*
placed after the brackets means that the characters inside brackets may be repeted 0 or more times.Let's clean our text from comments, blank lines, tab and space characters starting or ending lines :
sed -e 's/#.*//' -e 's/^[ ]*//' -e 's/[ ]*$//' -e '/^$/ d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6
Operations order is important :
Together, the fourth commands remove all blank lines, comments, and tabs or spaces at the beginning or the end of a line.
Test
sed '/\?$/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
sed '/?$/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
sed '/\\\!$/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.
sed '/! $/d' hello.txt
This is the first line.
Line 2 ? argh
Line 3 ? hi
Line 4 starts with blank characters.
Line 6 # This is a comment.
# This is the last comment starting with blank characters.