What is Regex?
If you are reading this post then most probably you already know what a regex is, if you don’t know here is a quick and easy definition
Regex stands for Regular Expression and is essentially an easy way to define a pattern of characters. The most common use of regex is in pattern identification, text mining, or input validation.
Let’s Get Regex...
As we have seen regex can be used to find a pattern in the given sentence lets just start this up by finding just character. We are going to have a look at regex with python as this is the programming language that I love to work with.
Python has a built-in package called re, which can be used to work with Regular Expressions.
Just to search simple word
import re
word=” There is something that we are looking for “ re.search(“are”, word)
print(x) #['are']
Like every programming language, there are some special characters in regex and so we need to escape them in order to serve them. Let’s see what happens when we directly used them without using Escape Sequence
import re
word=”www.creatorghost.com
x = re.findall(“.”, word)
print(x) #[‘w’, ‘w’, ‘w’, ‘.’, ‘c’, ‘r’, ‘e’, ‘a’, ‘t’, ‘o’, ‘r’, ‘g’, ‘h’, ‘o’, ‘s’, ‘t’, ‘.’, ‘c’, ‘o’, ‘m’]
Now let’s see the output using Escape Sequence ( \ )
import re
word=”www.creatorghost.com
x = re.findall(“\.”, word) #Here only . will be searched
print(x) #[‘.’, ‘.’]
Let's have a look at all the Metacharacters
Python Regular Expression Quick Guide
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
? 0 or more time
\S Matches any non-whitespace character
\r Carriage return character
* Repeats a character zero or more times
*? Repeats a character zero or more times
(non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times
(non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end
Here we have seen all Regular Expressions now let’s see how we can combine all of them to get a wonderful result. Let’s see them with a real-world example.
Case 1 — Remove All Url From Text
Sample text
text1
text2
http://url.com/bla1/blah1/
text3
text4
http://url.com/bla2/blah2/
text5
text6
http://url.com/bla3/blah3/
Now let’s remove all of the URL
import re
text = re.sub(r'^https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
Output:
text1
text2
text3
text4
text5
text6
Let's see how it worked. First of all, we used ^https that says starting with https and ? says it can either come one or it can’t come and then we used Escape Sequence to escape all our // and then the .* says any character can come any number of time and then we use \r\n to find all characters till the new line. And that how we were able to select all the URL and then we just python inbuilt re.sub to replace all URL with “ “ or just empty space.
Case 2 — Remove All Number From the text
text 123
text 234
text 3
text4
text5
text6
Now let's remove all number
import re
text = re.sub(r'[0-9]+', '', text,flags=re.MULTILINE)
Output:
text
text
text
text
text
text
Case 3 — To Remove All Special Character
text @& for you 123
We want to remove all special character like @ &
import re
text = re.sub(r'[^a-z0-9\s]', '', text,flags=re.MULTILINE)
Output:
text for you 123
Do we need to remember all the regular expressions?
The simple answer is no, you don’t have to remember all the regular expressions if you want to find any type of regular expression for most of the time you can google it and find it on StackOverflow or any similar website .So you might've thinking then why should we study this, to be simply put you the latest need to know what code are you coping from the internet as not always it may suits your need sometimes when you need to customize it the knowledge of regex will surely help.
I hope you like this post and you can also see my previous post to know why python is mostly used nowadays. Thanks for reading.
1 Comments
Very informative article.
ReplyDelete