What are Regular Expressions?
Grep Regex is one of the most popular command-line utilities to find and search strings in a text file.
Using the grep command with regular expressions makes it even more powerful.
Regular expressions come in the picture when you want to search for a text containing a particular pattern.
It simplifies your search operation by searching the patterns on each line of the file.
Types of Regular expressions
For ease of understanding let us learn the different types of Regex one by one.
Basic Regular expressions
Some of the commonly used commands with Regular expressions are tr, sed, vi and grep. Listed below are some of the basic Regex.
Symbol | Descriptions |
---|---|
. | replaces any character (single character) |
^ | matches start of string |
$ | matches end of string |
* | matches up zero or more times the preceding character |
\ | Represent special characters |
() | Groups regular expressions |
? | Matches up exactly one character |
Let’s create a sample test.txt file with the following content:
cat test.txt
Output:
This record ends in 2021
2021 is the start of the record
2021
2020
This record ends in 2020
2020 is the start of the record
For example, find all the lines which end with the word 2021:
grep "2021$" test.txt
You should see the following output:
This record ends in 2021
2021
Next, find all the lines which start and end with the word 2021:
grep "^2021$" test.txt
You should see the following output:
2021
For example, find all the lines which end with the word 2020 or 2021(The dot allows any single character in the place):
grep "202.$" test.txt
You should see the following output:
This record ends in 2021
2021
This record ends in 2020
2020
Now, lets display all the lines that start with the string balaram:
grep "^2021" test.txt
You should see the following output:
2021 is the start of the record
2021
Next, find the number of blank lines in the file test.txt:
grep "^$" test.txt
You should see the following output:
<3 empty lines>
Interval Regular expressions
These expressions tell us about the number of occurrences of a character in a string. They are
Expression | Description |
---|---|
{n} | Matches the preceding character appearing ‘n’ times exactly |
{n,m} | Matches the preceding character appearing ‘n’ times but not more than m |
{n, } | Matches the preceding character only when it appears ‘n’ times or more |
Example:
Filter out all lines that contain character ‘p’
We want to check that the character ‘p’ appears exactly 2 times in a string one after the other. For this the syntax would be:
cat sample | grep -E p\{2}
Note: You need to add -E with these regular expressions.
Extended regular expressions
These regular expressions contain combinations of more than one expression. Some of them are:
Expression | Description |
---|---|
\+ | Matches one or more occurrence of the previous character |
\? | Matches zero or one occurrence of the previous character |
Example:
Searching for all characters ‘t’
Suppose we want to filter out lines where character ‘a’ precedes character ‘t’
We can use command like
cat sample|grep "a\+t"
The regular expression (\) used to search for special characters.
Let’s create a sample test.txt file with the following contents:
cat test.txt
Output:
1.1.1.1
1a1a1a1
1b1c1d1
Now, search for all the lines which matches the pattern “1.1.1.1“:
grep "1.1.1.1" test.txt
This command does not show the proper result as “.” matches any single character:
1.1.1.1
1a1a1a1
1b1c1d1
You can use the regular expression “\” to resolve this issue:
grep "1\.1\.1\.1" test.txt
Output:
1.1.1.1
Brace expansion
The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces “{}”. The starting and ending items in a sequence are separated by two periods “..”.
Some examples:
In the above examples, the echo command creates strings using the brace expansion.
Let’s create a sample test.txt file with the following contents:
cat test.txt
Output:
apple
appple
appppple
Now, search for all the lines which match a character “p” two times:
grep -E "ap{2}l" test.txt
You should see the following output:
apple
Next, search for all the lines which match a character “p” two or more times:
grep -E "ap{2,}l" test.txt
You should see the following output:
apple
appple
appppple
Next, search for all the lines which match a character “p” two or three times:
grep -E "ap{2,3}l" test.txt
You should see the following output:
apple
appple
Square bracket expansion
The regular expression [] can be used to match any one character found within the bracket group.
search for all the lines which match any range character found within the “test” group.
grep "test[x-z]" test.txt
You should see the following output:
testx
testy
testz