Let’s say we have a string like +7(903)-123-45-67
and want to find all numbers in it. But unlike before, we are interested not in single digits, but full numbers: 7, 903, 123, 45, 67
.
A number is a sequence of 1 or more digits \d
. To mark how many we need, we need to append a quantifier.
Quantity {n}
The simplest quantifier is a number in curly braces: {n}
.
A quantifier is appended to a character (or a character class, or a [...]
set etc) and specifies how many we need.
It has a few advanced forms, let’s see examples:
- The exact count:
{5}
-
\d{5}
denotes exactly 5 digits, the same as\d\d\d\d\d
.The example below looks for a 5-digit number:
alert( "I'm 12345 years old".match(/\d{5}/) ); // "12345"
We can add
\b
to exclude longer numbers:\b\d{5}\b
. - The range:
{3,5}
, match 3-5 times -
To find numbers from 3 to 5 digits we can put the limits into curly braces:
\d{3,5}
alert( "I'm not 12, but 1234 years old".match(/\d{3,5}/) ); // "1234"
We can omit the upper limit.
Then a regexp
\d{3,}
looks for sequences of digits of length3
or more:alert( "I'm not 12, but 345678 years old".match(/\d{3,}/) ); // "345678"
Let’s return to the string +7(903)-123-45-67
.
A number is a sequence of one or more digits in a row. So the regexp is \d{1,}
:
let str = "+7(903)-123-45-67";
let numbers = str.match(/\d{1,}/g);
alert(numbers); // 7,903,123,45,67
Shorthands
There are shorthands for most used quantifiers:
+
-
Means “one or more”, the same as
{1,}
.For instance,
\d+
looks for numbers:let str = "+7(903)-123-45-67"; alert( str.match(/\d+/g) ); // 7,903,123,45,67
?
-
Means “zero or one”, the same as
{0,1}
. In other words, it makes the symbol optional.For instance, the pattern
ou?r
looks foro
followed by zero or oneu
, and thenr
.So,
colou?r
finds bothcolor
andcolour
:let str = "Should I write color or colour?"; alert( str.match(/colou?r/g) ); // color, colour
*
-
Means “zero or more”, the same as
{0,}
. That is, the character may repeat any times or be absent.For example,
\d0*
looks for a digit followed by any number of zeroes:alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1
Compare it with
'+'
(one or more):alert( "100 10 1".match(/\d0+/g) ); // 100, 10 // 1 not matched, as 0+ requires at least one zero
More examples
Quantifiers are used very often. They serve as the main “building block” of complex regular expressions, so let’s see more examples.
- Regexp “decimal fraction” (a number with a floating point):
\d+\.\d+
-
In action:
alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345
- Regexp “open HTML-tag without attributes”, like
<span>
or<p>
:/<[a-z]+>/i
-
In action:
alert( "<body> ... </body>".match(/<[a-z]+>/gi) ); // <body>
We look for character
'<'
followed by one or more English letters, and then'>'
. - Regexp “open HTML-tag without attributes” (improved):
/<[a-z][a-z0-9]*>/i
-
Better regexp: according to the standard, HTML tag name may have a digit at any position except the first one, like
<h1>
.alert( "<h1>Hi!</h1>".match(/<[a-z][a-z0-9]*>/gi) ); // <h1>
- Regexp “opening or closing HTML-tag without attributes”:
/<\/?[a-z][a-z0-9]*>/i
-
We added an optional slash
/?
before the tag. Had to escape it with a backslash, otherwise JavaScript would think it is the pattern end.alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
We can see one common rule in these examples: the more precise is the regular expression – the longer and more complex it is.
For instance, for HTML tags we could use a simpler regexp: <\w+>
.
…But because \w
means any English letter or a digit or '_'
, the regexp also matches non-tags, for instance <_>
. So it’s much simpler than <[a-z][a-z0-9]*>
, but less reliable.
Are we ok with <\w+>
or we need <[a-z][a-z0-9]*>
?
In real life both variants are acceptable. Depends on how tolerant we can be to “extra” matches and whether it’s difficult or not to filter them out by other means.
Yorumlar
<code>
kullanınız, birkaç satır eklemek için ise<pre>
kullanın. Eğer 10 satırdan fazla kod ekleyecekseniz plnkr kullanabilirsiniz)