Alternation is the term in regular expression that is actually a simple “OR”.
In a regular expression it is denoted with a vertical line character
The corresponding regexp:
A usage example:
We already saw a similar thing – square brackets. They allow to choose between multiple characters, for instance
Square brackets allow only characters or character sets. Alternation allows any expressions. A regexp
A|B|C means one of expressions
gr(a|e)ymeans exactly the same as
To apply alternation to a chosen part of the pattern, we can enclose it in parentheses:
I love HTML|CSSmatches
I love HTMLor
I love (HTML|CSS)matches
I love HTMLor
I love CSS.
In previous articles there was a task to build a regexp for searching time in the form
hh:mm, for instance
12:00. But a simple
\d\d:\d\d is too vague. It accepts
25:99 as the time (as 99 seconds match the pattern, but that time is invalid).
How can we make a better pattern?
We can use more careful matching. First, the hours:
- If the first digit is
1, then the next digit can be any:
- Otherwise, if the first digit is
2, then the next must be
- (no other first digit is allowed)
We can write both variants in a regexp using alternation:
Next, minutes must be from
59. In the regular expression language that can be written as
[0-5]\d: the first digit
0-5, and then any digit.
If we glue minutes and seconds together, we get the pattern:
We’re almost done, but there’s a problem. The alternation
| now happens to be between
That is: minutes are added to the second alternation variant, here’s a clear picture:
\d | 2[0-3]:[0-5]\d
That pattern looks for
But that’s wrong, the alternation should only be used in the “hours” part of the regular expression, to allow
2[0-3]. Let’s correct that by enclosing “hours” into parentheses:
The final solution: