Regular Expression

Regular Expression is used to search and manipulate text based on patterns in javascript. Here are the basic that we need to know in order to proceed further.

Regular Expression Basic

A normal regular expression will look like this

/abc/gi

The above code will match "vsadfABCasd", "abcScw" or any combination that contain 'abc' regardless of case sensitive due to 'gi' given at the end of the pattern where the meaning are describe below,

ModifierDescription
gDo global pattern matching.
iDo case-insensitive pattern matching.

Some characters don't match themselves, but are metacharacters. You can match these characters literally by placing a backslash in front of them. For example, "" matches a backslash and "$" matches a dollar-sign. Here's the list of metacharacters:

 | () [ { ^ $ * + ? .

A backslash also turns an alphanumeric character into a metacharacter. So whenever you see a backslash followed by an alphanumeric character:

d D w W t s 3

you can get the meaning of the above metacharacter and a list of javascript regular express methods from here.

/Homer|Marge|Bart|Lisa|Maggie/

Any of those strings can trigger a match. That is, the preceding expression matches all of the following strings:

  • "Homer"
  • "Bart"
  • "Lisa Simpson"
  • "Simpson, Marge"

You can group various sorts with parentheses, as in the following expression:

/(Homer|Marge|Bart|Lisa|Maggie) Simpson/

this way it will match only "Lisa Simpson".

Defining Regular expression

One way to define a regular expression is to simply assign it to a variable:

var varName = /PATTERN/[g|i|gi];

Here's an example:

var child = /(Bart|Lisa|Maggie) Simpson/i;

Another way to create a regular expression is to define it as an instance of the global RegExp object:

var varName = new RegExp("PATTERN", ["g"|"i"|"gi"]);

Once again, the modifiers are optional. Now, take a look at the same regular expression defined in another fashion:

var child = new RegExp("(Bart|Lisa|Maggie) Simpson", "i");

Quantifiers

Quantifiers say how many of the previous substring should match in a row. Here are a few quantifiers:

* + ? {4,8} {5,}

Quantifiers can only be put after atoms, assertions with width. They attach only to the previous atom, so if you want a quantifier to apply to multiple characters, you must group them together, like this:

/(Bart){3}/

This pattern matches "BartBartBart", whereas the following pattern matches the string "Barttt":

/Bart{3}/

Greedy and Non-Greedy

i found this very useful when dealing with more than one exact match. '*', '+', and '?' qualifiers are all greedy; they match as much text as possible which is not desired sometimes; if <.*> is matched against '

title

', it will match the entire string, and not just '

'. So if we just want '

' we can add'?' after the qualifier to make it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

Lookaheads

A lookahead matches only if the preceding subexpression is followed by the pattern, but the pattern is not part of the match. The subexpression is the part of the regular expression which will be matched. I found this pretty useful when i want to exclude certain words during pattern matching.

(?=pattern) matches only if there is a following pattern in input.
(?!pattern) matches only if there is not a following pattern in input.

E.g: /Vista(?=pro)/ matches 'Vista' only if 'Vista' is followed by 'pro'.

E.g: /Vista(?!pro)/ matches 'Vista' only if 'Vista' is not followed by 'pro'.

Backreferences

Backreferences are references to the same thing as a previously captured match. n is a positive nonzero integer telling the browser which captured match to reference to.

/(S)1(1)+/g matches all occurrences of three equal non-whitespace characters following each other.
/<(S+).*>(.*)/ matches any tag.

E.g: /<(S+).*>(.*)/ matches '

text
' in "text
text
text".

Character Set

Matches any of the contained characters. A range of characters may be defined by using a hyphen.

[characters] matches any of the contained characters.
[^characters] negates the character set and matches all but the contained characters

E.g: /[abcd]/ matches any of the characters 'a', 'b', 'c', 'd' and may be abbreviated to /[a-d]/. Ranges must be in ascending order, otherwise they will throw an error. (E.g: /[d-a]/ will throw an error.)
/[^0-9]/ matches all characters but digits.

Example

The following regular expression matches something similar to an HTML tag.

/<(.*)>.*</1>/

so "
JAVASCRIPT
" will be a match!

The following regular expression matches matches a string (at least four-characters long) whose first two characters are also its last two characters, but in reverse order.

/^(.)(.).*21$/

so "abcdefba" will be a match!
The following regular expression will swap the first two words of a string

s/(S+)s+(S+)/$2 $1/

In JavaScript this would be:

str = str.replace(/(S+)s+(S+)/, "$2 $1");

here shows how some of the regular express methods are used.

Share