Friday, July 3, 2015

Regular Expression Ride


In this post today I will make you understand how exactly regex works especially in python , so lets start the learning ride with one interesting example.

Before that I would like to share if you remember very basic things it will be pretty easy to understand and you will not have to scratch your head when you are working with it

So first of all why need to have regular expression?
Ans - Regular expression are mainly used to match or search a particular given string with the predefined regex format to check if string matches defined regex correctly.

To match and search you can use match() and search() method , Match objects always have a boolean value of true. Since match() and search() return None when there is no match, you can test whether there was a match with a simple if statement

For example :
I have one predefined regex stored in "pattern" variable
so consider below :
string = "prit.1.test.11.12p.com"
str="prit.1.test.11.12pr.com"
-----------------------------------------------------------
pattern = re.compile(r"prit.\d.test.11.\d+[a-z].[a-z]+")
match = re.search(pattern, string)
if  match:
//conditions
so in above code snippet the match will have value true
-----------------------------------------------------------

-----------------------------------------------------------
and consider below :
pattern = re.compile(r"prit.\d.test.11.\d+[a-z].[a-z]+")
match = re.search(pattern, str)
if match:
//conditions
In above code snippet the match will have value None
------------------------------------------------------------


So here as you can see in the above examples it compares the string and str with pattern variable
Now the pattern variable has value : prit.\d.test.11.\d+[a-z].[a-z]+

So below is the elaboration of it :
\d - matches a digit [0-9] (In example "1")
\d+[a-z] - matches one or more digit from [0-9] and match a single character present or not (In example "12p" is true but "12pr " will not match)
[a-z]+ - matches any string with one or more characters (In example "com")


There are more such characters followed by \ sign available which are listed below :

\s - Matches any whitespace character
\s+ - Matches one or more whitespace character
\w - Matches any alphanumeric character
\w+ - Matches any one or more alphanumeric character
\S - Matches any non whitespace character
\W - Matches any non-alphanumeric character i.e. symbols like (!,@,#,$,%)

No comments:

Post a Comment