No, as mentioned, I didn't check for mixed inputs, which is probably necessary to ensure we don't slow down the typical case due to adding an extra branch, so I went about and made an attempt to write a microbenchmark to capture this[1] by making the count parameter unpredictable without inducing much overhead (no randomness in the benchmark code itself), which should make it sufficiently hard for the compiler to prune the branch checks. I don't think generating different char[]s would change things, but feel free to improve upon this and generate random char[]s to verify that.

Doing this, I added Ivan Gerasimov's idea to fold the rare count == 0 check into the count < 0 clause, which seemed like a reasonable way to avoid adding the extra branch to the normal path[2], although it makes the code a little weirder. I'll denote
this version as experiment2, and my original code as experiment.

Results grouped on maxCount parameter, then ordered internally from baseline, experiment, experiment2. Probability of getting count == 0 is roughly 1/maxCount, and the cost per operation (average) increases as maxCount goes up as we're copying more data on average, the interesting metric is whether or not this optimization breaks the normal flow when we're only rarely hitting the non-allocating branch, which would show up as the experiment underperforming versus the baseline as we increase
maxCount:

Benchmark (maxCount) (numCount) Mode Samples Score Error Units o.s.SimpleBench.mixedNewStringFromCount 2 1000000 thrpt 50 46.114 ± 0.391 ops/us o.s.SimpleBench.mixedNewStringFromCount 2 1000000 thrpt 50 64.320 ± 0.569 ops/us o.s.SimpleBench.mixedNewStringFromCount 2 1000000 thrpt 50 64.874 ± 1.095 ops/us

o.s.SimpleBench.mixedNewStringFromCount 10 1000000 thrpt 50 48.609 ± 0.921 ops/us o.s.SimpleBench.mixedNewStringFromCount 10 1000000 thrpt 50 50.562 ± 0.598 ops/us o.s.SimpleBench.mixedNewStringFromCount 10 1000000 thrpt 50 50.724 ± 0.535 ops/us

o.s.SimpleBench.mixedNewStringFromCount 100 1000000 thrpt 50 31.422 ± 0.553 ops/us o.s.SimpleBench.mixedNewStringFromCount 100 1000000 thrpt 50 31.416 ± 0.512 ops/us o.s.SimpleBench.mixedNewStringFromCount 100 1000000 thrpt 50 31.852 ± 0.441 ops/us

o.s.SimpleBench.mixedNewStringFromCount 1000 1000000 thrpt 50 5.528 ± 0.053 ops/us o.s.SimpleBench.mixedNewStringFromCount 1000 1000000 thrpt 50 5.531 ± 0.039 ops/us o.s.SimpleBench.mixedNewStringFromCount 1000 1000000 thrpt 50 5.550 ± 0.033 ops/us

To me it looks like the micro-optimization can be applied either way without much risk that it'll slow down the normal flow. There's some indication that Ivan's suggestion is a slight improvement across the board.

/Claes

[1]
package org.sample;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.ThreadLocalRandom;

@State(Scope.Thread)
public class SimpleBench {

    @Param("1000000")
    public int numCount;

    @Param("10")
    public int maxCount;

    public int[] counts;

    public char[] data;

    public int index = 0;

    @Setup
    public void buildCounts() {
        counts = new int[numCount];
        data = new char[maxCount];
        ThreadLocalRandom r = ThreadLocalRandom.current();
        for (int i = 0; i < numCount; i++) {
            counts[i] = r.nextInt(0, maxCount);
        }
        for (int i = 0; i < maxCount; i++) {
            data[i] = 'a';
        }
    }

    @Benchmark
    public String mixedNewStringFromCount() {
        if (index >= numCount) {
            index = 0;
        }
        return new String(data, 0, counts[index++]);
    }
}

[2] http://cr.openjdk.java.net/~redestad/8067471/webrev.03/

On 2014-12-21 17:08, Lev Priima wrote:
Claes,

Is the source of arrays in these benchmarks unpredictable enough to avoid branch checks elimination in compiled code?

On 12/19/2014 05:25 PM, Claes Redestad wrote:
Hi again,

I just had a nagging thought. For applications reporting some or a lot of different empty String objects, this might be due not to use of this constructor, but due to things like an empty StringBuilder.toString() ending up calling java.lang.String(char[] value, int offset, int count) with a count of 0.

My feeling is that this would be a more probable/reasonable source of distinct empty strings in an application,
and would not be helped much by the current patch.

For completeness the java.lang.String(char/int[], int, int) constructors could have a check for count == 0 in which case it assigns value to "".value, see http://cr.openjdk.java.net/~redestad/8067471/webrev.01/

Running a few simple microbenchmarks on this (sorry, Aleksey!)[1] yields a bit of speedup, at no discernible
cost for a quick sanity check of an actually copying path:

baseline:
Benchmark Mode Samples Score Error Units o.s.SimpleBench.emptySBBench thrpt 5 48.213 ± 3.509 ops/us o.s.SimpleBench.emptyString thrpt 5 103.900 ± 4.869 ops/us o.s.SimpleBench.emptyStringCount thrpt 5 105.802 ± 3.355 ops/us o.s.SimpleBench.emptyStringCountCP thrpt 5 104.464 ± 8.251 ops/us o.s.SimpleBench.singleStringCount thrpt 5 76.379 ± 4.139 ops/us o.s.SimpleBench.singleStringCountCP thrpt 5 88.202 ± 4.945 ops/us

experiment:
Benchmark Mode Samples Score Error Units o.s.SimpleBench.emptySBBench thrpt 5 153.822 ± 7.900 ops/us 3x o.s.SimpleBench.emptyString thrpt 5 183.588 ± 4.429 ops/us 1.75x o.s.SimpleBench.emptyStringCount thrpt 5 166.457 ± 9.202 ops/us 1.6x o.s.SimpleBench.emptyStringCountCP thrpt 5 168.089 ± 4.676 ops/us 1.6x o.s.SimpleBench.singleStringCount thrpt 5 75.780 ± 8.490 ops/us 1x o.s.SimpleBench.singleStringCountCP thrpt 5 89.202 ± 3.714 ops/us 1x

I guess we'd need to check that compiled code don't get messed up for a mix of inputs where count can be both
zero and non-zero with some probability.
Do we have such benchmarks examples?

/Claes

[1] public char[] emptyCharArray = new char[0];
    public int[] emptyIntArray = new int[0];

    public char[] singleCharArray = new char[] { 'x' };
    public int[] singleIntArray = new int[] { 102 };
    public StringBuilder emptySB = new StringBuilder();

    @Benchmark
    public String emptySBBench() {
        return emptySB.toString();
    }

    @Benchmark
    public String emptyString() {
        return new String();
    }

    @Benchmark
    public String emptyStringCount() {
        return new String(emptyCharArray, 0, 0);
    }

    @Benchmark
    public String emptyStringCountCP() {
        return new String(emptyIntArray, 0, 0);
    }

    @Benchmark
    public String singleStringCount() {
        return new String(singleCharArray, 0, 1);
    }

    @Benchmark
    public String singleStringCountCP() {
        return new String(singleIntArray, 0, 1);
    }


On 2014-12-19 01:41, Lev Priima wrote:
Claes,
Thanks for cool suggestion: http://cr.openjdk.java.net/~lpriima/8067471/webrev.01/:
  public java.lang.String();
    Signature: ()V
    flags: ACC_PUBLIC
    LineNumberTable:
      line 137: 0
      line 138: 4
      line 139: 13
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
         4: aload_0
         5: ldc           #2                  // String
         7: getfield      #3                  // Field value:[C
        10: putfield      #3                  // Field value:[C
        13: return
      LineNumberTable:
        line 137: 0
        line 138: 4


Aleksey ,
By naive glance, new constructor thrpt is bigger(jdk9/dev vs 9b42 in attach) on:
    @Benchmark
    public String getNewString() {
        return new String();
    }
Please suggest things to compare . Do we have bigapps-like benchmarks which may show regression on this?

On 17.12.2014 21:10, Aleksey Shipilev wrote:
On 17.12.2014 18:58, Claes Redestad wrote:
On 2014-12-17 11:22, Lev Priima wrote:
Please review space optimization in no args String constructor.
Originally, it was already rejected once by suggestion in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010300.html
w/o formal justification of pros and contras.  And enhancement was
requested again by Nathan Reynolds.

Issue: https://bugs.openjdk.java.net/browse/JDK-8067471
Patch: http://cr.openjdk.java.net/~lpriima/8067471/webrev.00/
I am OK with this change pending we understand the performance impact a
little better. Please do benchmarks.


Could this have some obscure security implications, such as the ability to lock up some subsystem via retrieving and synchronizing on the shared
new String().value? I guess not, for practical purposes.
This would be an issue if we published String.value directly, but we don't.

I guess it could also be written:

     public String() {
         this.value = "".value;
     }

... which would saves some byte code and should avoid having to allocate
a distinct object for this.
Oh, I like this trick, given "" is probably already in constant pool for some other reason. However, there is an additional dereference. We need a targeted nanobenchmark to properly quantify the costs for either variant.

-Aleksey.






Reply via email to