Regular Expressions In Javascript

Posted on 08 Sep 2002

in Code

by David Andersson (liorean)

Rated 4.34 (Ratings: 11)

Want more?

More articles in Code

David Andersson

Member info

User since: 14 Jan 2002

Articles written: 1

What is a regular expression?

Regular expressions is a form of pattern matching that you can apply on textual content. Take for example the DOS wildcards ? and * which you can use when you're searching for a file. That is a kind of very limited subset of RegExp. For instance, if you want to find all files beginning with "fn", followed by 1 to 4 random characters, and ending with "ht.txt", you can't do that with the usual DOS wildcards. RegExp, on the other hand, could handle that and much more complicated patterns.

Regular expressions are, in short, a way to effectively handle data, search and replace strings, and provide extended string handling. Often a regular expression can in itself provide string handling that other functionalities such as the built-in string methods and properties can only do if you use them in a complicated function or loop.

RegExp Syntax

There are two ways of defining regular expressions in JavaScript — one through an object constructor and one through a literal. The object can be changed at runtime, but the literal is compiled at load of the script, and provides better performance. The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input. In almost all cases you can use either way to define a regular expression, and they will be handled in exactly the same way no matter how you declare them.

Declaration

Here are the ways to declare a regular expression in JavaScript. While other languages such as PHP or VBScript use other delimiters, in JavaScript you use forward slash (/) when you declare RegExp literals.

Syntax	Example
RegExp Literal
/pattern/flags;	var re = /mac/i;
RegExp Object Constructor
new RegExp("pattern","flags");	var re = new RegExp(window.prompt("Please input a regex.","yes yeah"),"g");

Flags

There are three flags that you may use on a RegExp. The multiline flag has bad support in older browsers, but the other two are supported in pretty much every browser that can handle RegExp. These flags can be used in any order or combination, and are an integral part of the RegExp.

Flag	Description
Global Search
`g`	The global search flag makes the RegExp search for a pattern throughout the string, creating an array of all occurrences it can find matching the given pattern.
Ignore Case
`i`	The ignore case flag makes a regular expression case insensitive. For international coders, note that this might not work on extended characters such as å, ü, ñ, æ.
Multiline Input
`m`	This flag makes the beginning of input (`^`) and end of input (`$`) codes also catch beginning and end of line respectively.

Pattern

The patterns used in RegExp can be very simple, or very complicated, depending on what you're trying to accomplish. To match a simple string like "Hello World!" is no harder then actually writing the string, but if you want to match an e-mail address or html tag, you might end up with a very complicated pattern that will use most of the syntax presented in the table below.

Pattern	Description
Escaping
`\`	Escapes special characters to literal and literal characters to special. E.g: `/$s$/` matches '(s)' while `/(\s)/` matches any whitespace and captures the match.
Quantifiers
{n}, {n,}, {n,m}, *, +, ?	Quantifiers match the preceding subpattern a certain number of characters. The subpattern can be a single character, an escape sequence, a pattern enclosed by parentheses or a character set. {n} matches exactly n times. {n,} matches n or more times. {n,m} matches n to m times. `*` is short for `{0,}`. Matches zero or more times. `+` is short for `{1,}`. Matches one or more times. `?` is short for `{0,1}`. Matches zero or one time. E.g: `/o{1,3}/` matches 'oo' in "tooth" and 'o' in "nose".
Pattern delimiters
(pattern), (?:pattern)	Matches entire contained pattern. (pattern) captures match. (?:pattern) doesn't capture match E.g: `/(d).\1/` matches and captures 'dad' in "abcdadef" while `/(?:.d){2}/` matches but doesn't capture 'cdad'. Note: (?:pattern) is very badly supported in older browsers.
Lookaheads
(?=pattern), (?!pattern)	A lookahead matches only if the preceeding subexpression is followed by the pattern, but the pattern is not part of the match. The subexpression is the part of the regular expression which will be matched. (?=pattern) matches only if there is a following pattern in input. (?!pattern) matches only if there is not a following pattern in input. E.g: `/Win(?=98)/` matches 'Win' only if 'Win' is followed by '98'. Note: Support for lookaheads is lacking in most but the newest browsers.
Alternation
	Alternation matches content on either side of the alternation character. E.g: `/(a b)a/` matches 'aa' in "dseaas" and 'ba' in "acbab".
Character sets
[characters], [^characters]	Matches any of the contained characters. A range of characters may be defined by using a hyphen. [characters] matches any of the contained characters. [^characters] negates the character set and matches all but the contained characters E.g: `/[abcd]/` matches any of the characters 'a', 'b', 'c', 'd' and may be abbreviated to `/[a-d]/`. Ranges must be in ascending order, otherwise they will throw an error. (E.g: `/[d-a]/` will throw an error.) `/[^0-9]/` matches all characters but digits. Note: Most special characters are automatically escaped to their literal meaning in character sets.
Special characters
`^`, `$`, `.`, `?` and all the highlighted characters above in the table.	Special characters mean characters that match something else than what they appear as. `^` matches beginning of input (or new line with m flag). `$` matches end of input (or end of line with m flag). `.` matches any character except a newline. `?` directly following a quantifier makes the quantifier non-greedy (makes it match minimum instead of maximum of the interval defined). E.g: `/(.)?/` matches nothing or '' in all strings. Note:* Non-greedy matches are not supported in older browsers such as Netscape Navigator 4 or Microsoft Internet Explorer 5.0.
Literal characters
All characters except those with special meaning.	Mapped directly to the corresponding character. E.g: `/a/` matches 'a' in "Any ancestor".
Backreferences
\n	Backreferences are references to the same thing as a previously captured match. n is a positive nonzero integer telling the browser which captured match to reference to. `/(\S)\1(\1)+/g` matches all occurrences of three equal non-whitespace characters following each other. /<(\S+).>(.)<\/\1>/ matches any tag. E.g: /<(\S+).>(.)<\/\1>/ matches '<div id="me">text</div>' in "text<div id=\"me\">text</div>text".
Character Escapes
`\f`, `\r`, , `\t`, `\v`, `\0`, `[\b]`, `\s`, `\S`, `\w`, `\W`, `\d`, `\D`, `\b`, `\B`, \cX, \xhh, \uhhhh	`\f` matches form-feed. `\r` matches carrriage return. matches linefeed. `\t` matches horizontal tab. `\v` matches vertical tab. `\0` matches NUL character. `[\b]` matches backspace. `\s` matches whitespace (short for `[\f \r\t\v\u00A0\u2028\u2029]`). `\S` matches anything but a whitespace (short for `[^\f \r\t\v\u00A0\u2028\u2029]`). `\w` matches any alphanumerical character (word characters) including underscore (short for `[a-zA-Z0-9_]`). `\W` matches any non-word characters (short for `[^a-zA-Z0-9_]`). `\d` matches any digit (short for `[0-9]`). `\D` matches any non-digit (short for `[^0-9]`). `\b` matches a word boundary (the position between a word and a space). `\B` matches a non-word boundary (short for `[^\b]`). \cX matches a control character. E.g: `\cm` matches control-M. \xhh matches the character with two characters of hexadecimal code hh. \uhhhh matches the Unicode character with four characters of hexadecimal code hhhh.

Usage

Now, knowing how a RegExp is written is only half the game. To gain anything from them you have to know how to use them too. There are a number of ways to implement a RegExp, some through methods belonging to the String object, some through methods belonging to the RegExp object. Whether the regular expression is declared through an object constructor or a literal makes no difference as to the usage.

Description	Example
RegExp.exec(string)
Applies the RegExp to the given string, and returns the match information.	var match = /s(amp)le/i.exec("Sample text") `match` then contains ["Sample","amp"]
RegExp.test(string)
Tests if the given string matches the Regexp, and returns true if matching, false if not.	var match = /sample/.test("Sample text") `match` then contains `false`
String.match(pattern)
Matches given string with the RegExp. With `g` flag returns an array containing the matches, without `g` flag returns just the first match or if no match is found returns null.	var str = "Watch out for the rock!".match(/r?or?/g) `str` then contains ["o","or","ro"]
String.search(pattern)
Matches RegExp with string and returns the index of the beginning of the match if found, -1 if not.	var ndx = "Watch out for the rock!".search(/for/) `ndx` then contains `10`
String.replace(pattern,string)
Replaces matches with the given string, and returns the edited string.	var str = "Liorean said: My name is Liorean!".replace(/Liorean/g,'Big Fat Dork') `str` then contains "Big Fat Dork said: My name is Big Fat Dork!"
String.split(pattern)
Cuts a string into an array, making cuts at matches.	var str = "I am confused".split(/\s/g) `str` then contains ["I","am","confused"]

Liorean is a twenty one years old medical student and hobbyist web designer mostly working with JavaScript and CSS, DOM and the newest html standards available. His personal dwelling on the net can be found at liorean@web-graphics.com.

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content