Regular Expressions and Validation

Page 1 - Introduction and Basic Matching

Regular Expressions (RegExp, Regex, etc.)

Skip to navigation

Nov
19

Regular Expressions are a popular way of performing complex string matching on variables. To anyone who is relatively new to them though, it can be daunting. This guide shows how to use them, with examples for using them in validation techniques.

A regular expression is a series of characters and symbols that is used to try and match the contents of a string.

A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements.

Probably the easiest way to describe how a regular expression works is to give a simple example.

reg(ular)?exp(ressions)?

The above example demonstrates a number of different ways of pattern matching. This pattern will look for any of the following strings: regexp, regularexp, regularexpressions. The reason these 3 strings can be matched is because of (ular)? and (ressions)? - these become optional due to the ? quantifier after those strings have been "grouped".

There are 4 ways of specifying a quantity in the pattern, and these can be described as follows:

?
This will match 0 or 1 of the preceeding pattern. For example, b(a)? or ba? would match b or ba.
*
This will match 0 or more of the preceeding pattern. For example, b(a)* or ba* will match b, ba, baa, baaa, etc.
+
This will match 1 or more of the preceeding pattern. For example, b(a)+ or ba+ will match ba, baa, baaa, etc.
{1} or {3,8}
This is used to specify an exact number or range of times a pattern can occur. So in a{3,4} it will look for 3 or 4 a's.

Using parentheses to encapsulate a pattern is important in many cases - it allows for us to use quantifiers on more than just the previous part of the pattern, and can also allow us to make choices.

a(ba|ab)?a

In the above example, the (ba|ab) means it will match ba or ab, and is then specified as being there 0 or 1 times by the question mark. So the above pattern will match abaa or aaba.