January 26, 2006

Learn how to use regular expressions the easy way

I am sure many of you will agree with me if I tell you that regular expressions are an integral part of all Posix operating systems. So what exactly are regular expressions ?

Regular expressions are characters that also include symbols which when used in a group, convey a special meaning to the shell. Broadly speaking, there are a few frequently used characters which are interpreted by the shell in a special way. They are as follows:

? - Question mark is used to mean a character repeated at most one time. That is 0 or 1 times.
* - Asterisk means the preceding character is repeated any number of times.
. - A dot signifies any character.
^ - The beginning of the line
$ - End of the line
\ - Used when you want to represent the special characters literally. For example, to show '*', I can use '\*' which tells the shell not to interpret the literal.

For example, a regular expression 'ap?le*' will select the following strings:
aplee
aple
aleee

but not 'apple' or 'appplee' because 'p' can occur only 0 or 1 times.

Usually, people are a bit confused the first time they have to use regular expressions. I was, the first time I used them.

For those who find learning regular expressions a real chore, there is a very useful utility called kregexpeditor which is bundled with Linux running KDE. This utility can be used effectively to come up to date with regular expressions. See figure below.

Fig: KRegExpEditor interface (Click on picture)

kregexpeditor has a very intuitive interface and contains both an inbuilt graphical editor, a verification window as well as a command line edit where one can try out different combinations of regular expressions. For example, try figuring out for yourselves what the following regular expression selects ...
\b[A-Z0-9._%-]+@[A-Z0-9-]+\.[A-Z]{2,4}\b 
Hint: Open up kregexpeditor and enter the above regular expression into the line edit box labeled Ascii Syntax. And you will get a graphical representation of what this string expands to in the upper portion of the editor.
Related Content


6 comments:

Anonymous said...

This is a nice article you have here.

Thanks for sharing this topic.

Ben Helps said...

On the issue of people spamming as you, if it's on sites which log IP addresses and may be sympathetic to your flight (e.g. maybe ProBlogger, etc), approach the owners and see if they can help you track down the suckholes doing it (unless they're on anon proxies/dial-up accounts).

Iain said...

This is a learning obstacle that I have been too lazy to approach before, I will try out kregexpeditor and see if that can get me over it. Thanks!

Ravi said...

Ben,
Thanks for the suggestion. Actually, it doesn't really matter because the intention of maintaining this blog is my interest in Linux.
But, I was compelled to take a stand on this issue because my silence will only help spread an untrue opinion about this blog.

Anonymous said...

Very useful! Thanks!

Anonymous said...

Hi

I'm not experienced with Regex but i sometimes used:

$ echo echo12Z | egrep --color [a-z]
echo12Z

echo is in red, useful to watch what is matched..

Hope this help !
PS: site is nice an useful

action09

Get Posts via email