String and regular expressions

The JavaScript String class has a few methods that could be used with RegExp. Here is a breaf introduction to them: split(), search(), match(), and replace().

split

Regular expressions make String.split() more flexible:
"1, 2, 3".split(", "); // 1
"1 ,  2 ,3".split(/\s*,\s*/); // 2
1. This is how is commonly called split(). If we are sure on how the separator is, it works fine, and returns an array of three elements containg just the numbers. But what if we have no control on the blanks? And we could have no blanks, or many of them, around a comma? You usually won't be happy with the result.
2. This is the solution. A comma is the real separator, and any blank around it is eaten out.

search()

Passing a RegExp to search() on a specified string, we get back an integer representing the index of the first match, or minus one if there is no match:
"JavaScript".search(/script/); // 1
"JavaScript".search(/Script/); // 2
"JavaScript".search(/script/i); // 3
"Here is 1 for you, 3 for me, 42 for the others.".search(/\d+/g); // 4
1. The search is by default case sensitive, so the pattern here is not found, and the function call returns -1.
2. The match is found starting on 4.
3. We can override the default, saying that we want to perform a case-insensitive search, specifying the "i" flag after the pattern. This function call returns 4.
4. The global option makes no sense in this context, and it is silently ignored. The returned value is the position of the first match, 8.

match()

The match() behavior is quite articulate. Its output should be interpreted differently accordingly to the provided input. Seeing it in action should clarify what I mean:
var s = "Here is 1 for you, 3 for me, 42 for the others.";
s.match(/\d+/); // 1
s.match(/\d+/g); // 2
"My email address is someone@not.existing.zzz".match(/(\S*)@(\S*)/);
1. We ask to match() to return just the first subsequence matching the specified pattern, in this case a string of one or more digits. It returns a vector containing one element, "1".
2. Global matching, a vector is returned containing all the matching elements, in this case three strings, "1", "3", and "42".
3. The RegExp represents an email address, first part is the sequence of not-blank characters before the "at" symbol; second part is what is after it, till we get a blank. Notice that we asked match() to capture $1 and $2 as left and right side of the "at". In this case match() returns an array containing in first position the full matching element, and in the next positions $1, $2, ...

replace()

This function returns a copy of the original string where, if the RegExp is found, it is replaced by the second parameter passed:
"... javaScript ... javascript ... Javascript ...".replace(/javascript/gi, "JavaScript"); // 1
"I want to have 'doublequotes' in this string!".replace(/'([^']*)'/g, '"$1"'); // 2
1. Notice the two flags associated to the RegExp, specifying that we are preforming a global and case insensitive search. All the three JavaScript variation in the string are found and replaced by the string passed as second argument to the function.
2. What we are saying here is: search for each {g option} sequence starting with a single quote {'}, followed by any number {*} of any character but a single quote {[^']} and a single quote, and call $1 the subexpression in the round parenthesis. Then replace what you have found with a double quote, the $1 subexpression, and another double quote.

No comments:

Post a Comment