Friday, October 19, 2007

String Concatenation

Interesting interview of Heinz Kabutz.

An excerpt speaking about the performance of different ways of String concatenation in Java.

In the early days of Java programming, I sometimes resorted to "clever" coding. For
example, when I was optimizing a system written by a company in Germany, I changed
the String addition to use StringBuffer after we had optimized the architecture and
design of the system and wanted to improve things a bit. Don't read too much into
microbenchmarks. Performance advantages come from good design and an appropriate
architecture.

We start with a basic concatenation based on +=:

public static String concat1(String s1, String s2, String s3,
String s4, String s5, String s6) {
String result = "";
result += s1;
result += s2;
result += s3;
result += s4;
result += s5;
result += s6;
return result;
}



String is immutable, so the compiled code will create many intermediate String
objects, which can strain the garbage collector. A common remedy is to introduce
StringBuffer, causing it to look like this:

public static String concat2(String s1, String s2, String s3,
String s4, String s5, String s6) {
StringBuffer result = new StringBuffer();
result.append(s1);
result.append(s2);
result.append(s3);
result.append(s4);
result.append(s5);
result.append(s6);
return result.toString();
}



But the code is becoming less legible, which is undesirable.

Using JDK 6.0_02 and the server HotSpot compiler, I can execute concat1() a million
times in 2013 milliseconds, but concat2() in 734 milliseconds. At this point, I might
congratulate myself for making the code three times faster. However, the user won't
notice it if 0.1 percent of the program becomes three times faster.

Here's a third approach that I used to make my code run faster, back in the days of
JDK 1.3. Instead of creating an empty StringBuffer, I sized it to the number of
required characters, like so:

public static String concat3(String s1, String s2, String s3,
String s4, String s5, String s6) {
return new StringBuffer(
s1.length() + s2.length() + s3.length() + s4.length() +
s5.length() + s6.length()).append(s1).append(s2).
append(s3).append(s4).append(s5).append(s6).toString();
}



I managed to call that a million times in 604 milliseconds. Even faster than
concat2(). But is this the best way to add the strings? And what is the simplest way?

The approach in concat4() illustrates another way:

public static String concat4(String s1, String s2, String s3,
String s4, String s5, String s6) {
return s1 + s2 + s3 + s4 + s5 + s6;
}



You can hardly make it simpler than that. Interestingly, in Java SE 6, I can call the
code a million times in 578 milliseconds, which is even better than the far more
complicated concat3(). The method is cleaner, easier to understand, and quicker than
our previous best effort.

Sun introduced the StringBuilder class in J2SE 5.0, which is almost the same as
StringBuffer, except it's not thread-safe. Thread safety is usually not necessary
with StringBuffer, since it is seldom shared between threads. When Strings are added
using the + operator, the compiler in J2SE 5.0 and Java SE 6 will automatically use
StringBuilder. If StringBuffer is hard-coded, this optimization will not occur.

When a time-critical method causes a significant bottleneck in your application, it's
possible to speed up string concatenation by doing this:

public static String concat5(String s1, String s2, String s3,
String s4, String s5, String s6) {
return new StringBuilder(
s1.length() + s2.length() + s3.length() + s4.length() +
s5.length() + s6.length()).append(s1).append(s2).
append(s3).append(s4).append(s5).append(s6).toString();
}


However, doing this prevents future versions of the Java platform from automatically
speeding up the system, and again, it makes the code more difficult to read.

No comments: