{variable code, maths, language, concepts, puzzles, etc;}: String Concatenation

Interesting interview of Heinz Kabutz.

An excerpt speaking about the performance of different ways of String concatenation in Java.

In the early days of Java programming, I sometimes resorted to "clever" coding. For 
example, when I was optimizing a system written by a company in Germany, I changed 
the String addition to use StringBuffer after we had optimized the architecture and 
design of the system and wanted to improve things a bit. Don't read too much into 
microbenchmarks. Performance advantages come from good design and an appropriate 
architecture.

We start with a basic concatenation based on +=:

  public static String concat1(String s1, String s2, String s3,
                               String s4, String s5, String s6) {
    String result = "";
    result += s1;
    result += s2;
    result += s3;
    result += s4;
    result += s5;
    result += s6;
    return result;
  }

 

String is immutable, so the compiled code will create many intermediate String 
objects, which can strain the garbage collector. A common remedy is to introduce 
StringBuffer, causing it to look like this:

public static String concat2(String s1, String s2, String s3,
                               String s4, String s5, String s6) {
    StringBuffer result = new StringBuffer();
    result.append(s1);
    result.append(s2);
    result.append(s3);
    result.append(s4);
    result.append(s5);
    result.append(s6);
    return result.toString();
  }

 

But the code is becoming less legible, which is undesirable.

Using JDK 6.0_02 and the server HotSpot compiler, I can execute concat1() a million 
times in 2013 milliseconds, but concat2() in 734 milliseconds. At this point, I might
congratulate myself for making the code three times faster. However, the user won't 
notice it if 0.1 percent of the program becomes three times faster.

Here's a third approach that I used to make my code run faster, back in the days of 
JDK 1.3. Instead of creating an empty StringBuffer, I sized it to the number of 
required characters, like so:

  public static String concat3(String s1, String s2, String s3,
                               String s4, String s5, String s6) {
    return new StringBuffer(
        s1.length() + s2.length() + s3.length() + s4.length() +
            s5.length() + s6.length()).append(s1).append(s2).
        append(s3).append(s4).append(s5).append(s6).toString();
  }

 

I managed to call that a million times in 604 milliseconds. Even faster than 
concat2(). But is this the best way to add the strings? And what is the simplest way?

The approach in concat4() illustrates another way:

  public static String concat4(String s1, String s2, String s3,
                               String s4, String s5, String s6) {
    return s1 + s2 + s3 + s4 + s5 + s6;
  }

 

You can hardly make it simpler than that. Interestingly, in Java SE 6, I can call the
code a million times in 578 milliseconds, which is even better than the far more 
complicated concat3(). The method is cleaner, easier to understand, and quicker than 
our previous best effort.

Sun introduced the StringBuilder class in J2SE 5.0, which is almost the same as 
StringBuffer, except it's not thread-safe. Thread safety is usually not necessary 
with StringBuffer, since it is seldom shared between threads. When Strings are added 
using the + operator, the compiler in J2SE 5.0 and Java SE 6 will automatically use 
StringBuilder. If StringBuffer is hard-coded, this optimization will not occur.

When a time-critical method causes a significant bottleneck in your application, it's
possible to speed up string concatenation by doing this:

  public static String concat5(String s1, String s2, String s3,
                               String s4, String s5, String s6) {
    return new StringBuilder(
      s1.length() + s2.length() + s3.length() + s4.length() +
          s5.length() + s6.length()).append(s1).append(s2).
        append(s3).append(s4).append(s5).append(s6).toString();
  }

 
However, doing this prevents future versions of the Java platform from automatically 
speeding up the system, and again, it makes the code more difficult to read.
{variable code, maths, language, concepts, puzzles, etc;}

Friday, October 19, 2007

String Concatenation

No comments: