How to use basic regular expressions to search better and save time
If you’re a programmer, you’ve no doubt been in this situation before: you’ve got to convert a bunch of data from one form to another. Maybe you’ve got an enum that needs to processed with a switch statement (and you don’t have ReSharper to do it for you!), or maybe it’s static strings that need to be put into a dictionary…or a million other possibilities. If it’s only five or ten lines, you’d probably just retype it, or cut and paste — the rote approach. But what if it’s twenty lines…or a hundred? That’s when it’s worth it to get down & dirty with some regular expressions.
I had to do this just the other day: I had an HTML table containing a descriptive name and a product code, and some other (possibly empty) fields, and I needed to turn it into T-SQL ALTER statements. I’ll use this example to explain some of my favorite regex tricks.
What I had was something like this (I’ve substituted t for the actual tabs):
And what I needed was this:
I did this in Visual Studio 2010, which uses it’s own brand of regular expressions, but I’ll make sure to point out where I’m using Microsoft-specific regex syntax.
1. Don’t try to do everything in one regular expression.
Sure, you probably could, but in doing so, you’re going to drive yourself crazy. It’s easiest to handle what you need to do in steps. The first thing I knew was going to cause trouble is the single quotation marks in the descriptions: those need to be escaped to be used in a T-SQL string. T-SQL escapes single quotation marks by doubling them, so the first replacement was easy (and didn’t even need regular expressions): replace ‘ with ”.
2. Don’t be greedy.
One of the subtleties of regular expressions that people often neglect is the difference between greedy and lazy matches. Greedy matching (the default) means that any matching specifier (like * or +) will keep on matching as long as it can, whether that’s what you intended or not. In general, you can get the job done with either greedy or lazy matching, but it’s often more work to use greedy matching (the default).
3. Tagged expressions in Visual Studio.
Note that VS uses curly brackets to denote tagged subexpressions, whereas basic regular expressions use ( and ) and extended regular expressions use plain parentheses.
In our example, the “obvious” regular expression won’t do what I want: replacing “<.>t<.>” with “ALTER Product SET Name=’1′ WHERE Model=’2;’” yields:
Whoops! What’s going on here is that first “.*” is greedily matching everything up to the last tab character, effectively lumping together the product name with the second field.
Of course I could just use “<.>t<.>t.*t.*” but I like my regular expressions to be lazy, just like I am. POSIX and Perl regular expression syntax use *? and +? to specify the corresponding lazy matches, but Microsoft has chosen to go their own way and use @ and #.
So now I can finally get what I want by replacing “[email protected]>t[email protected]>t.*” with “ALTER Product SET Name=’1′ WHERE Model=’2′;”
4. Ctrl-Z is your friend.
Unless you’re using regular expressions many times a day every day, chances are, you’re not always going to remember every modifier, and there might be a little little trial-and-error and, fortunately, our friend Ctrl-Z is there to help us out. Don’t be afraid to shrug your shoulders and try something out…if it mangles your text, just undo it and try it again.
5. If it’s not matching, make sure you’re not accidentally using an un-escaped metacharacter.
If you keep getting “no matches found”, you might want to check and make sure you’re not trying to match a literal with an unescaped metacharacter. VS in particular is tricky this way…almost every punctuation mark is a metacharacter in VS, and needs to be escaped (with a backslash).
6. Learning regular expressions is so worth it.
If you’re new to regular expressions, or inexperienced with them, this may all seem completely impenetrable and archaic, and I get it man, I do. There’s an undeniable learning curve to regular expressions, but once you reach a certain proficiency, you will save an unbelievable amount of time. There are a million good tutorials and books out there if you’re just getting started. I’m especially partial to Jim Hollenhorst’s tutorial, especially if you’re a VS user.
When I began writing about the .NET implementation of regular expressions, I intended to focus solely on the .NET classes and not on regular expressions themselves. To that end, I started with a couple of very basic tasks that can be formed with regular expressions: splitting strings with the Regex::Split method and using the Match and MatchCollection methods to enumerate found literals or patterns.
Many readers wrote in explaining that they are new to regular expressions, and they want at least an article or two that explain some basic patterns–instead of having all the examples search for literals or unexplained patterns. Therefore, this article presents some very basic regular expression patterns to get people who are new to this area off and running.
Searching for Letters and Words
The first thing to note is that regular expression patterns consist of metacharacters, which are simply characters that represent other characters and tell the regular expressions parser how to scan the input string and what to search for.
The following sections illustrate some basic patterns that utilize some of the more commonly used metacharacters. The end of the article presents a table that you can use as a metacharacter quick-reference when creating your own patterns.
Here’s a basic string that all the example patterns in this article use:
Now, suppose you want to locate all the proper nouns in this sentence. Logically, you would know that you need to locate every capitalized word. In terms of a regular expression pattern, that would look like the following:
If you’re new to regular expressions, this definitely will look a bit strange at first. The following breakdown explains each of the pattern’s components:
The A-Z indicates that you’re looking for any letter within the range of a capital A to a capital Z. The square brackets simply group this like the parenthesis in a numeric equation to remove ambiguity. [a-z]+
Once again, a range of characters is being searched for, this time the lower case letters a-z. However, the plus sign (+) after the right bracket indicates that the parser will search for one match to the criteria specified in the brackets. In other words, the parser is being instructed to look for one or more lowercase letters. [ ]*
This part of the pattern indicates that a space will be located, but the asterisk after the right bracket indicates that the parser will match on zero or any number of spaces. This allows for the pattern to properly handle the end of a string or multiple spaces between words.
So there you have it. The following pattern simply states “Find a capital letter, followed by one or more lowercase letters, followed by any number of spaces.”
Using this pattern results in the following list of matches:
However, the pattern has two problems. First, it will not catch capitalized abbreviations or acronyms, as it stipulates that only one capital letter will be matched, followed by lowercase letters. Therefore, the current pattern used on the following input value will not yield the match for IBM:
To fix this problem, you need to modify the pattern as follows where I’ve bolded change:
What I’ve inserted into this pattern is the A-Z range and the vertical bar separator (|), which acts as an “or” operator. Therefore, the pattern now states: “Find a single capital letter followed by one or more upper and lower case letters followed by any number of spaces”. The pattern will now yield the following:
The second problem that the pattern has is it doesn’t handle singular pronouns correctly. This is the easiest problem to solve. All you need to do is replace the plus sign in the pattern with an asterisk, so that the parser knows that there may not be a sequence of letters following the capital letter. Using an input value of: “John, Harry and I are members of the Borbon club at IBM.”, the pattern would be as follows:
As promised, the following table contains the most commonly used metacharacters.
Table 1: Commonly used regular expressions metacharacters
| Expression | Description |
| . | Matches any character except n |
| [characters] | Matches a single character in the list |
| [^characters] | Matches a single character not in the list |
| [charX-charY] | Matches a single character in the specified range |
| w | Matches a word character, same as [a-zA-Z_0-9] |
| W | Matches a non-word character |
| s | Matches a whitespace character; same as [nrtf] |
| S | Matches a non-whitespace character |
| d | Matches a decimal digit; same as [0-9] |
| D | Matches a nondigit character |
| ^ | Match the beginning of a line |
| $ | Match the end of a line |
| b | On a word boundary |
| B | Not on a word boundary |
| * | Zero or more matches |
| + | One or more matches |
| ? | Zero or one match |
| Exactly n matches | |
| At least n matches | |
| At least n but no more than m matches | |
| ( ) | Capture matched substring |
| (? ) | Capture matched substring into group name |
| | | Logical OR |
Simply combine these metacharacters with what you learned in the previous articles on string splitting and using the Match and MatchCollection classes and you’ll be surprised at how easily you can search for many basic patterns.
More Advanced Uses of Regular Expressions
At this point, you have the basic knowledge required to form regular expressions and use them in your Managed C++ code. While what you’ve learned thus far will work for a lot of common parsing needs, regular expressions allow you to do so much more than search for simple character patterns. For example, you can:
- Search for email addresses where the number of valid formats leads to very complex patterns
- Search and replace specific patterns
- Extract specific information, such as searching for phone numbers and then extracting only the area code
In order to move into these more advanced areas of regular expressions use, you’ll need to know about groups and captures. Therefore, future articles will cover these areas and examine some of the tasks mentioned in this article.
Download the Code
To download the code for the demo application, click here.
About the Author
Tom Archer owns his own training company, Archer Consulting Group, which specializes in educating and mentoring .NET programmers and providing project management consulting. If you would like to find out how the Archer Consulting Group can help you reduce development costs, get your software to market faster, and increase product revenue, contact Tom through his Web site.
This tutorial explains how to use extended regular expressions with grep command in detail. Learn what extended regular expressions are and how they work with grep command through a practical example that extracts all links form a html file.
Extended regular expressions
A regular expression is a search pattern that grep command matches in specified file or in provided text. In order to allow a user to express the regular expression in more customized way, grep assigns special meanings to few characters. These characters are known as Meta characters. Initially, grep assigned the characters ^ $ . [ ] and * as Meta characters. Later few more characters were added in this list. These were ( ) ? + and |.
Based on the use of Meta characters, a regular expression can be divided in two categories; BRE (Basic Regular Expression) and ERE (Extended Regular Expression).
Basic Regular Expression: – An expression which uses the default Meta characters.
Extended Regular Expression: – An expression which uses the later added Meta characters.
This tutorial is the last part of the article “grep command in Linux explained with options and regular expressions“. Other parts of this article are following.
This tutorial is the first part of the article. It explains grep command options and regular expressions with special meanings of Meta characters.
This tutorial is the second part of the article. It explains how to use options with grep command in detail with practical examples.
This tutorial is the third part of the article. It explains how to use regular expressions with grep command in detail with practical examples.
How to use extended regular expression
Extended regular expression uses the Meta characters which were added later. Since later added characters are not defined in original implementation, grep treats them as regular characters unless we ask it to use them as Meta characters.
To instruct grep command to use later added characters as Meta characters, an option –E is used. Let’s take an example. In original implementation, the pipe sign (|) is defined as regular character while in new implementation, it is defined as a Meta character.
If we use pipe sign without –E option, grep will treat it as a regular character. But if we use it with –E option, grep will treat it as a Meta character. As a Meta character, it is used to search multiple words. Let’s search two users’ information in file /etc/passwd with and without –E option.
Without –E option, grep searched the pattern as a single word sanjay|rick in the file /etc/passwd. While with –E option, it separated the pattern in two words sanjay and rick and searched them individually.
grep extended regex (search multiple words)
The pipe sign (|) is used to search multiple words with grep command. To search multiple words with grep command, connect all of them with pipe sign and surround by quote signs. For example to search words abc, fgh, xyz, mno and jkl, use the search pattern “abc|fgh|xyz|mno|jkl”.
grep extended regex (search all links with linked text from an html file)
To extract all links from an html file named html_file, use following command.
Let’s understand above command in detail.
grep command options
-E: – This option instructs grep command that search pattern contains the Meta characters which were added later.
o:- By default, grep prints entire line which contains the search pattern. This option instructs grep command to print only the matching words instead of entire line.
i:- This option ask grep command to ignore the case while matching the pattern.
Extended regular expression
[^ >] :- Match everything except >.
+ :- Match preceding one or more time.
> :- Ending point of anchor tag.
This string is followed by a Meta character + which instructs grep command to match it one or more times.
Meta character dot (.) represents any single character and star (*) represents any number of characters. We used both together to search for any characters between starting and closing anchor tag.
:- This is the closing point of anchor tag.
Collectively above search patterns says search a text string which
In more simple language any value .
Following figure illustrates the use of above regex
grep regex (print only anchor tag)
If you are interested only in anchor tags, you can exclude the expression which prints the linked text as following.
grep regex ( extract all links or URLs from an html file and save them in a text file)
To extract all links or URLs from an html file and save them in a text file, we have to combine three commands. These commands are: –
We have to combine these commands in following way.
In above commands,
- First command receives its input from file named html_file, second command receives its input form first command and third command receives its input from second command.
- First command extracts all anchor attributes from html file and sends output to the second command instead of printing it at command prompt.
- Second command extracts all href tags from the output of the first command and sends output to the third command.
- Third command extracts all links from the output of the second command and save output to a text file named link-only.
Following figures explains above commands with output.
That’s all for this tutorial. If you like this tutorial, please don’t forget to share with friends through your favorite social site.
By ComputerNetworkingNotes Updated on 2021-06-25 10:04:38 IST
ComputerNetworkingNotes Linux Tutorials Use Extended Regular Expressions with grep command
In visual basic, regular expression (regex) is a pattern and it is useful to parse and validate whether the given input text is matching the defined pattern (such as an email address) or not.
Generally, the key part to process the text with regular expressions is regular expression engine and it is represented by Regex class in visual basic. The Regex class is available with System.Text.RegularExpressions namespace.
To validate the given input text using regular expressions, the Regex class will expect the following two items of information.
- The regular expression pattern to identity in the text.
- The input text to parse for the regular expression pattern.
Visual Basic Regular Expression Example
Following is the example of validating whether the given text is in proper email format or not using Regex class in visual basic.
Sub Main( ByVal args As String ())
Dim email As String = “[email protected]”
Console .Write( “Is valid: ” , result)
If you observe the above example, we are validating whether the input string (email) is in valid email format or not using Regex class IsMatch method by sending input string and the regular expression pattern to validate the input text.
When we execute the above code, we will get the result like as shown below.
This is how the Regex class is useful to validate the input string by sending regular expression patterns based on our requirements.
Visual Basic Regex Class Methods
In visual basic, the Regex class is having a different methods to perform various operations on input string. The following table lists various methods of Regex class in c#.
| Method | Description |
|---|---|
| IsMatch | It will determine whether the given input string matching with regular expression pattern or not. |
| Matches | It will return one or more occurrences of text that matches the regular expression pattern. |
| Replace | It will replace the text that matches the regular expression pattern. |
| Split | It will split the string into an array of substrings at the positions that match the regular expression pattern. |
These Regex class methods are useful to validate, replace or split the string values by using regular expression patterns based on our requirements.
Visual Basic Regex Replace String Example
Following is the example of finding the substrings using regular expression patterns and replace with required values in visual basic.
Sub Main( ByVal args As String ())
Dim str As String = “Hi,[email protected]#lane.com”
Dim result As String = Regex .Replace(str, “[^a-zA-Z0-9_]+” , ” ” )
Console .Write( ” ” , result)
If you observe the above example, we used Regx.Replace method to find and replace all the special characters in a string with space using regular expression patterns ( “[^a-zA-Z0-9_]+” ).
Here, the regular expression pattern ( “[^a-zA-Z0-9_]+” ) will try to match any single character that is not in the defined character group.
When we execute the above example, we will get the result as shown below.
Visual Basic Regex Find Duplicate Words Example
Generally, while writing the content we will do common mistakes like duplicating the words. By using a regular expression pattern, we can easily identify duplicate words.
Following is the example of identifying the duplicate words in a given string using Regex class methods in visual basic.
Sub Main( ByVal args As String ())
Dim str As String = “Welcome To to Tutlane.com. Learn c# in in easily”
Dim collection As MatchCollection = Regex .Matches(str, “\b(\w+?)\s\1\b” , RegexOptions .IgnoreCase)
For Each m As Match In collection
Console .WriteLine( ” (duplicates ‘‘) at position ” , m.Value, m.Groups(1).Value, m.Index)
If you observe the above example, we used Regx.Matches method to find the duplicated words using regular expression pattern ( “\b(\w+?)\s\1\b” ).
Here, the regular expression pattern ( “\b(\w+?)\s\1\b” ) will perform a case-insensitive search and identify the duplicate words which exist side by side like (To to or in in).
When we execute the above example, we will get the result as shown below.
To to (duplicates ‘To’) at position 8
in in (duplicates ‘in’) at position 36
This is how we can use regular expressions in visual basic to parse and validate the given string based on our requirements.
A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.(Wikipedia).
Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java and Python.
What is a regular expression and what makes it so important?
Regex are used in Google analytics in URL matching in supporting search and replace in most popular editors like Sublime, Notepad++, Brackets, Google Docs and Microsoft word.
The above regular expression can be used for checking if a given set of characters is an email address or not.
How to write regular expression?
/s : matches any whitespace characters such as space and tab
/S : matches any non-whitespace characters
/d : matches any digit character
/D : matches any non-digit characters
/w : matches any word character (basically alpha-numeric)
/W : matches any non-word character
/b : matches any word boundary (this would include spaces, dashes, commas, semi-colons, etc)
[set_of_characters] – Matches any single character in set_of_characters. By default, the match is case-sensitive.
[^set_of_characters] – Negation: Matches any single character that is not in set_of_characters. By default, the match is case sensitive.
[first-last] – Character range: Matches any single character in the range from first to last.
The Escape Symbol : \
If you want to match for the actual ‘+’, ‘.’ etc characters, add a backslash( \ ) before that character. This will tell the computer to treat the following character as a search character and consider it for matching pattern.
Grouping Characters ( )
A set of different symbols of a regular expression can be grouped together to act as a single unit and behave as a block, for this, you need to wrap the regular expression in the parenthesis( ).
Vertical Bar ( | ) :
Matches any one element separated by the vertical bar (|) character.
Backreference: allows a previously matched sub-expression(expression captured or enclosed within circular brackets ) to be identified subsequently in the same regular expression. \n means that group enclosed within the n-th bracket will be repeated at current position.
Inline comment: The comment ends at the first closing parenthesis.
# [to end of line] :X-mode comment. The comment starts at an unescaped # and continues to the end of the line.
This article is contributed by Abhinav Tiwari .If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to [email protected] See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Only Thing You Need to Get Set Go…..
- Home
- About Me
- Java
- PERL
In the previous post, as we saw how to use grep to search for words and its powerful switches. In this post we will see how to use Basic regular expressions to increase the power of grep command.
First we will have a look at few regular expressions provided by UNIX as below:
Basic Regular Expressions:
| Symbol | Meaning |
| ^ | Caret symbol – Match beginning of the line |
| $ | Match end of the line |
| * | Match zero of more occurrences of previous character |
| . | Match any single character |
| [] | Match range of characters, just single occurrence |
| [a-z] | Match Small Letters |
| [A-Z] | Match Capital Letters |
| [0-9] | Match Numerals |
| [^] | Match Negate a Sequence |
| \ | Match Escape Character |
Now as we have seen the above regular expressions – Greek Symbols :-), it will be good to use them in some examples to have a better understanding:
Example1: Find all the lines from a file which start with “Mr”
grep ‘^Mr’ filename
Example2 : Find all the lines which ends with ‘sh’
grep ‘sh$’ filename
Example3: Display all the lines in a file expect empty lines.
grep –v ‘^$’ filename
Example4 : Search for a word which is having three letters in it and starts with x and ends with m.
grep ‘x[a-z]m’ filename
Example5: Search words do not contain ‘ac’ in a file.
grep ‘[^ac]’ filename
Example6: Search for a ‘[‘ in a file
grep ‘\[’ filename
Note: The “[“is a special character, so you have to use a \ to negate it
JavaScript supports regular expressions through regexp classes. For example:
It can match the first “apple” string in a string and is case sensitive. Add the second parameter “g” in the construction method to search all “apples” in the string, where “g” stands for “global”. If the second parameter is “I”, it represents case insensitive, and the case of letters will not be considered in the matching process. Combining the above two, we can search all “apple” strings, regardless of case.
Regular expressions are not unique. Using the syntax of Perl language, the above expressions can be expressed as:
After creating a regexp object, regexp methods can construct different matching methods. Because regular expressions operate on strings, some methods of string also play an important role in the process of constructing regular expressions.
Method of regexp object
The output result of the above code is “true”, because the samplestring contains the string “apple” to be matched, which is the simplest detection method. Sometimes, we need to know the detailed results of the match, for example:
By using the exec () method, the returned arr is an array of matching results, including each matching value and its segment, such as “green apples” or “red apples” in the above example. The function of match() method is the same as that of exec(), but its expression is different
The search() method is similar to indexof(), which returns the location of the first matching string
String method
The replace() method of string can replace the specified string with another string:
The same effect can be achieved by replacing the first parameter of replace() with a regular expression
The second parameter of replace() can be replaced with a function(), which takes a matching string as a parameter and returns a replacement string( (in doubt)
Regular expressions can achieve the same function as the split () method of string.
Game Development Software Engineer
MBA Aspiring @ University of Nevada, Reno
Oregon Tech Alum
We’ve all been there. You have a string input and need a fast and efficient way to parse something important out of it. Your options are relatively low since manually parsing a string for a pattern is tedious and often very inefficient as the string gets larger. So what do you do? You turn to Regular Expressions! What’s the problem with that? Let’s explore.
Impossible To Read or Debug
A huge assumption that is made when creating regular expressions is that the schema you are programming to won’t change. If it does, it could require rewriting the regular expression to hopefully produce the same usable output. But let’s say you are tasked with fixing a broken regular expression that fell victim to a changing schema. It means that you would have to first understand how the regex worked with the old schema, before understanding how the new schema changed. Only then can you rewrite the regular expression to account for the new input. That’s a fairly tedious process that is potentially very error prone. And the level of difficulty goes up exponentially with the length and complexity of regular expressions. I would hate to be the only one in charge of fixing this 6.2kb monster that validates RFC822 email addresses.
Regex Abuse
A common use case for regular expression is something like the following:
This regular expression tries to emulate a parser to rip out useful information into named capture groups from a structured data set like json. The benefit of this is that (in c# at least) you then can have reference to exactly what the regular expression matched on.
The downside of this is that you are using the wrong tool for the job. As much as it might seem like a quick and easy solution, it causes more problems than it solves. Parsing json, xml, or even html with regular expressions is a terrible idea. And it’s mostly a solved problem. Check out this HTML Python Parser. Using a tool like this will make your coding easier and make code maintenance easier in the future.
Balancing Act
I know most of this article has been bashing the use of regular expressions, but there are some benefits to using them (if used correctly). All developers and engineers should learn to use basic regular expressions, because they’ll produce better, more flexible, more maintainable code with them. When used responsibly, regular expressions are a huge net positive. For example, writing a regular expression to validate a phone number is relatively straightforward:
Conclusion
Regular expressions are extremely powerful and useful in the right situation. When abused and used in incorrect situations, they can lead to ugly and unmaintainable code. So use them wisely!
Comment below your opinion on Regular Expressions and if you use them regularly!
Чему вы научитесь
Материалы курса
My First Section 5 лекции • 18 мин
Characters 5 лекции • 22 мин
Sets 4 лекции • 25 мин
Repetition 4 лекции • 17 мин
Grouping 4 лекции • 21 мин
Anchors and boundaries 3 лекции • 12 мин
Backreferencing, assertions and lookaheads 5 лекции • 32 мин
Unicode – Multi Language Symbol Support 2 лекции • 13 мин
Regular Expressions Examples 4 лекции • 26 мин
Требования
Описание
What this course is?
This course is universal, meaning that the regular expression material you learn here will be applicable in most if not all regular expression engines.
Of course there will be some variations when we are implementing regular expressions in different engines, let’s say PHP over Javascript, but the core fundamentals and how you do regular expression stays the same everywhere.
Regular expressions are also called Regex, Regexp or Regexes, so we will be probably be using this vocabulary since it’s easier to pronounce and is what we commonly use in the programming community.
We will learn how to implement regular expressions in Javascript and PHP but these lectures are done for demonstration on how Regexes are used in programming languages.
Who this course is for?
New developers that want to learn regular expressions.
Frustrated developers who had issues learning it before.
Any developer who is serious about their programming career.
Some information about the course structure!
We will start slow with the most basic regular expression functionality, like searching and matching, learning what each of the symbols do and how to use them to do what we need.
After we learn the most basic things, we will start with more complex operations and real worlds solutions. I always try keep the lectures short so that the material is easy to digest.
At the end of every section we are going to have some practice code so that we can re-enforce everything for that section.