The Beginner’s Guide To Regular Expressions

Regular expressions (abbreviated as a “regex”) are useful tools that help easily find and match text in strings and files. They sort of function like the typical find-in-document feature you are probably used to, only more advanced. However, they can definitely be too advanced for an average person to comprehend. In this guide, I will easily explain this widely used and useful feature so you can search files like a pro.

Something Worth Noting

It is worth noting that regexes are very useful and have lots of features to utilize, but can also be very non-standardized. There are always features in other regex parsing engines that are not supported by others. The features I will be displaying in this guide are the most widely supported ones, but there are always going to be features that I won’t teach you how to use here either because they are not known or because they are defunct and not considered good practice.

Basic Matching

Regexes can match specific strings, no complex syntax needed. For example, to match the text foo in notnotnotfoonotnot, your regex would simply be foo.

This works with multiple instances too; the regex foo would match both instances of foo in notnotnotfoonotnotfoo.

Something else that you should know about basic matching is that there will be two seperate matches if the same matching string is used multiple times (together or not).

This means that the regex foo would match all three foo instances in notnotfoofoonotfoo (notice how there are separate matches for the two instances of foo, even though said instances are together).

Match Any Character

The period selects any character. Note that each character is matched separately.

For example, the regex . in the text foo will match f, o, and the second o.

Notice how in the image above, even special characters and numbers are matched.

Letter Range Match

Let’s say in the text abcdefghijklmnopqrstuvwxyz, we want to match just e, f, g, h, and i. To do this, we can select the letter range that we want (in this case, e through i) using the regex [e-i].

Note that these letter ranges do not have to be subsequent; we can have the regex [a-c] in the text azaczbca match a, a, c, b, c, and a.

Number Range Match

Let’s say in the text 019201836y7 you only want to select the numbers 0, 1, 2, the second 0, and the second 1. In other words, we want to select any number from zero to two, including themselves. We can do this using the number range match, and in this case, we can use the regex [0-2].

These letter ranges also don’t have to be consecutive.

Character Set

Now let’s say you want to match specific characters that can’t be matched with a set. For this, we will use a character set. For example, we can match car and bar in the text car,bar using a character set. Because only the beginning letters of each word change, we can use a character set to match the c and b, and then use basic matching to match the ars. To do this, we can put in the characters we want to put in the set in square brackets.

Ignored Character Set

How about if we wanted to match every character except for c and b. To do this we can use an ignored character set to get the job done. If we wanted to match only tar in the text car,bar,tar, we can use an ignored character set to ignore c and b, then use basic matching as all three words have different initials but the same of everything else. To do this, we can put the characters we want to ignore in square brackets with the character ^ coming before them.

The Asterisk

The asterisk (*), when placed outside a character indicates that said character should be matched even if it does not occur at all or occurs at least once side by side. For example, the regex a* will match at least zero of a, so aaaa will be matched in the text aaaab. This is different from basic matching, as the regex a will count every a as a separate match, while a* will keep all the as in one match as many times as needed to complete the match.

Another examples is that the asterisk is like saying the character that comes before it is optional or can occur multiple times in a row. The regex ab*c in the text abc,abbc,ac will match abc, abbc, and ac.

The Question Mark

The question mark is just like the asterisk in the sense that the character that comes before it is deemed optional, but different in the sense that it will not match characters that occur multiple times in a row. For example, the regex ab?c will match abc and ac, but not abbc.

The Plus Symbol

The plus symbol is also just like the asterisk in the sense that the character that comes before it can occur multiple times in a row, but different in the sense that it does not indicate a character is optional. If we go back to our abc example, we can see that the regex ab+c in abc,abbc,ac will match abc and abbc, but not ac.

Conclusion

And that’s it – you’re done! There are so many other regex sequences you can use, and they just won’t fit into a beginner’s guide. As you saw above, regular expressions are very useful tools, and there are many ways to apply this to your needs.