No, as mentioned, I didn't check for mixed inputs, which is probably
necessary to ensure we don't slow down the typical case due
to adding an extra branch, so I went about and made an attempt to write
a microbenchmark to capture this[1] by making the count
parameter unpredictable without inducing much overhead (no randomness in
the benchmark code itself), which should make
it sufficiently hard for the compiler to prune the branch checks. I
don't think generating different char[]s would change things,
but feel free to improve upon this and generate random char[]s to verify
that.
Doing this, I added Ivan Gerasimov's idea to fold the rare count == 0
check into the count < 0 clause, which seemed like a
reasonable way to avoid adding the extra branch to the normal path[2],
although it makes the code a little weirder. I'll denote
this version as experiment2, and my original code as experiment.
Results grouped on maxCount parameter, then ordered internally from
baseline, experiment, experiment2. Probability of getting
count == 0 is roughly 1/maxCount, and the cost per operation (average)
increases as maxCount goes up as we're copying more
data on average, the interesting metric is whether or not this
optimization breaks the normal flow when we're only rarely
hitting the non-allocating branch, which would show up as the experiment
underperforming versus the baseline as we increase
maxCount:
Benchmark (maxCount) (numCount) Mode
Samples Score Error Units
o.s.SimpleBench.mixedNewStringFromCount 2 1000000
thrpt 50 46.114 ± 0.391 ops/us
o.s.SimpleBench.mixedNewStringFromCount 2 1000000
thrpt 50 64.320 ± 0.569 ops/us
o.s.SimpleBench.mixedNewStringFromCount 2 1000000
thrpt 50 64.874 ± 1.095 ops/us
o.s.SimpleBench.mixedNewStringFromCount 10 1000000
thrpt 50 48.609 ± 0.921 ops/us
o.s.SimpleBench.mixedNewStringFromCount 10 1000000
thrpt 50 50.562 ± 0.598 ops/us
o.s.SimpleBench.mixedNewStringFromCount 10 1000000
thrpt 50 50.724 ± 0.535 ops/us
o.s.SimpleBench.mixedNewStringFromCount 100 1000000
thrpt 50 31.422 ± 0.553 ops/us
o.s.SimpleBench.mixedNewStringFromCount 100 1000000
thrpt 50 31.416 ± 0.512 ops/us
o.s.SimpleBench.mixedNewStringFromCount 100 1000000
thrpt 50 31.852 ± 0.441 ops/us
o.s.SimpleBench.mixedNewStringFromCount 1000 1000000
thrpt 50 5.528 ± 0.053 ops/us
o.s.SimpleBench.mixedNewStringFromCount 1000 1000000
thrpt 50 5.531 ± 0.039 ops/us
o.s.SimpleBench.mixedNewStringFromCount 1000 1000000
thrpt 50 5.550 ± 0.033 ops/us
To me it looks like the micro-optimization can be applied either way
without much risk that it'll slow down the normal flow.
There's some indication that Ivan's suggestion is a slight improvement
across the board.
/Claes
[1]
package org.sample;
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.ThreadLocalRandom;
@State(Scope.Thread)
public class SimpleBench {
@Param("1000000")
public int numCount;
@Param("10")
public int maxCount;
public int[] counts;
public char[] data;
public int index = 0;
@Setup
public void buildCounts() {
counts = new int[numCount];
data = new char[maxCount];
ThreadLocalRandom r = ThreadLocalRandom.current();
for (int i = 0; i < numCount; i++) {
counts[i] = r.nextInt(0, maxCount);
}
for (int i = 0; i < maxCount; i++) {
data[i] = 'a';
}
}
@Benchmark
public String mixedNewStringFromCount() {
if (index >= numCount) {
index = 0;
}
return new String(data, 0, counts[index++]);
}
}
[2] http://cr.openjdk.java.net/~redestad/8067471/webrev.03/
On 2014-12-21 17:08, Lev Priima wrote:
Claes,
Is the source of arrays in these benchmarks unpredictable enough to
avoid branch checks elimination in compiled code?
On 12/19/2014 05:25 PM, Claes Redestad wrote:
Hi again,
I just had a nagging thought. For applications reporting some or a
lot of different empty String objects,
this might be due not to use of this constructor, but due to things
like an empty StringBuilder.toString() ending
up calling java.lang.String(char[] value, int offset, int count) with
a count of 0.
My feeling is that this would be a more probable/reasonable source of
distinct empty strings in an application,
and would not be helped much by the current patch.
For completeness the java.lang.String(char/int[], int, int)
constructors could have a check for count == 0 in
which case it assigns value to "".value, see
http://cr.openjdk.java.net/~redestad/8067471/webrev.01/
Running a few simple microbenchmarks on this (sorry, Aleksey!)[1]
yields a bit of speedup, at no discernible
cost for a quick sanity check of an actually copying path:
baseline:
Benchmark Mode Samples Score Error
Units
o.s.SimpleBench.emptySBBench thrpt 5 48.213 ± 3.509
ops/us
o.s.SimpleBench.emptyString thrpt 5 103.900 ± 4.869
ops/us
o.s.SimpleBench.emptyStringCount thrpt 5 105.802 ± 3.355
ops/us
o.s.SimpleBench.emptyStringCountCP thrpt 5 104.464 ± 8.251
ops/us
o.s.SimpleBench.singleStringCount thrpt 5 76.379 ± 4.139
ops/us
o.s.SimpleBench.singleStringCountCP thrpt 5 88.202 ± 4.945
ops/us
experiment:
Benchmark Mode Samples Score Error
Units
o.s.SimpleBench.emptySBBench thrpt 5 153.822 ± 7.900
ops/us 3x
o.s.SimpleBench.emptyString thrpt 5 183.588 ± 4.429
ops/us 1.75x
o.s.SimpleBench.emptyStringCount thrpt 5 166.457 ± 9.202
ops/us 1.6x
o.s.SimpleBench.emptyStringCountCP thrpt 5 168.089 ± 4.676
ops/us 1.6x
o.s.SimpleBench.singleStringCount thrpt 5 75.780 ± 8.490
ops/us 1x
o.s.SimpleBench.singleStringCountCP thrpt 5 89.202 ± 3.714
ops/us 1x
I guess we'd need to check that compiled code don't get messed up for
a mix of inputs where count can be both
zero and non-zero with some probability.
Do we have such benchmarks examples?
/Claes
[1] public char[] emptyCharArray = new char[0];
public int[] emptyIntArray = new int[0];
public char[] singleCharArray = new char[] { 'x' };
public int[] singleIntArray = new int[] { 102 };
public StringBuilder emptySB = new StringBuilder();
@Benchmark
public String emptySBBench() {
return emptySB.toString();
}
@Benchmark
public String emptyString() {
return new String();
}
@Benchmark
public String emptyStringCount() {
return new String(emptyCharArray, 0, 0);
}
@Benchmark
public String emptyStringCountCP() {
return new String(emptyIntArray, 0, 0);
}
@Benchmark
public String singleStringCount() {
return new String(singleCharArray, 0, 1);
}
@Benchmark
public String singleStringCountCP() {
return new String(singleIntArray, 0, 1);
}
On 2014-12-19 01:41, Lev Priima wrote:
Claes,
Thanks for cool suggestion:
http://cr.openjdk.java.net/~lpriima/8067471/webrev.01/:
public java.lang.String();
Signature: ()V
flags: ACC_PUBLIC
LineNumberTable:
line 137: 0
line 138: 4
line 139: 13
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method
java/lang/Object."<init>":()V
4: aload_0
5: ldc #2 // String
7: getfield #3 // Field value:[C
10: putfield #3 // Field value:[C
13: return
LineNumberTable:
line 137: 0
line 138: 4
Aleksey ,
By naive glance, new constructor thrpt is bigger(jdk9/dev vs 9b42
in attach) on:
@Benchmark
public String getNewString() {
return new String();
}
Please suggest things to compare . Do we have bigapps-like
benchmarks which may show regression on this?
On 17.12.2014 21:10, Aleksey Shipilev wrote:
On 17.12.2014 18:58, Claes Redestad wrote:
On 2014-12-17 11:22, Lev Priima wrote:
Please review space optimization in no args String constructor.
Originally, it was already rejected once by suggestion in
http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010300.html
w/o formal justification of pros and contras. And enhancement was
requested again by Nathan Reynolds.
Issue: https://bugs.openjdk.java.net/browse/JDK-8067471
Patch: http://cr.openjdk.java.net/~lpriima/8067471/webrev.00/
I am OK with this change pending we understand the performance
impact a
little better. Please do benchmarks.
Could this have some obscure security implications, such as the
ability
to lock up some subsystem via retrieving and synchronizing on the
shared
new String().value? I guess not, for practical purposes.
This would be an issue if we published String.value directly, but
we don't.
I guess it could also be written:
public String() {
this.value = "".value;
}
... which would saves some byte code and should avoid having to
allocate
a distinct object for this.
Oh, I like this trick, given "" is probably already in constant
pool for
some other reason. However, there is an additional dereference. We
need
a targeted nanobenchmark to properly quantify the costs for either
variant.
-Aleksey.