Limited String.split()

The Java split() String method is fun and useful, but I never thought to it as interesting. And I was wrong. For instance, till the other day I never paid much attention to its two parameter overload that limits the number of token generated.

Basic splitting

Usually we use the one argument overload of String.split(), that converts the string in an array of Strings, using the passed parameter as delimiter.

Let's have a look at this code, where s is a String:
System.out.print("Split [");
for(String c : s.split("/")) {
    System.out.print('_' + c + '_' );
}
System.out.println("]");
If in s we put "alpha/beta/gamma/delta/tango/foxtrot//" we expect
Split [_alpha__beta__gamma__delta__tango__foxtrot_]
The slashes delimited 6 non-empty tokens, so we have in output an array of six Strings.

Passing "/alpha/beta/gamma/delta/tango/foxtrot" to the same code we get
Split [___alpha__beta__gamma__delta__tango__foxtrot_]
Seven tokens! The first slash is considered a delimiter between an empty element and "alpha".

Limited splitting

A bit of refactoring on the previous code, that now uses the other split() overload for an integer, i, and with the adding of a counter in the output to make it clearer:
String[] c = s.split("/", i);
for(int j = 0; j < c.length; ++j) {
    System.out.print(j + ":_" + c[j] + '_');
}
System.out.println();
Negative limit

If i is -1 (or whichever negative number), passing "alpha/beta/gamma/delta/tango/foxtrot//" we get
0:_alpha_1:_beta_2:_gamma_3:_delta_4:_tango_5:_foxtrot_6:__7:__
Even the two empty elements at the end are generated!

If the input is set to "/alpha/beta/gamma/delta/tango/foxtrot" the result is not so interesting, no changes compared to the basic version usage.

Zero limit

In this case the extended version of split() is a synonym of the basic one. We get exactly the same results.

Positive limit

Here we are asking to split to generate, at maximum, a well defined number of tokens.

If the input string s is "alpha/beta/gamma/delta/tango/foxtrot//" we get these results, varying i from 1 to 4:
0:_alpha/beta/gamma/delta/tango/foxtrot//_
0:_alpha_1:_beta/gamma/delta/tango/foxtrot//_
0:_alpha_1:_beta_2:_gamma/delta/tango/foxtrot//_
0:_alpha_1:_beta_2:_gamma_3:_delta/tango/foxtrot//_
Same behavior for "/alpha/beta/gamma/delta/tango/foxtrot":
0:_/alpha/beta/gamma/delta/tango/foxtrot_
0:__1:_alpha/beta/gamma/delta/tango/foxtrot_
0:__1:_alpha_2:_beta/gamma/delta/tango/foxtrot_
0:__1:_alpha_2:_beta_3:_gamma/delta/tango/foxtrot_

No comments:

Post a Comment