Hello,
in the code of HashSet(Collection) we have an optimization opportunity:
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
instead of using addAll() inherited from j.u.Collection we can use
c.forEach(this::add):
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
c.forEach(this::add);
}
This allows to rid Iterator and use counting loops for e.g. ArrayList instead
of hasNext()/next().
To me the most common use cases of HashSet(Collection) are those:
- ArrayList/Arrays.asList() is passed to convert List->Set or/and excluded
duplicates
- HashSet is passed to have a copy of original Set
We are also likely to benefit from this change in case the argument collection
is synchronized as it's going to be locked only once.
I've used the benchmark [1] to analyze the impact and it demonstrates
measurable improvement [2].
What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it
demonstrates better results than at measurement time:
length = 10
# Run progress: 4,44% complete, ETA 00:21:41
# Fork: 3 of 5
# Warmup Iteration 1: 134,699 ns/op
# Warmup Iteration 2: 115,391 ns/op
# Warmup Iteration 3: 130,110 ns/op
# Warmup Iteration 4: 114,465 ns/op <----
# Warmup Iteration 5: 114,849 ns/op
# Warmup Iteration 6: 115,073 ns/op
# Warmup Iteration 7: 113,988 ns/op
# Warmup Iteration 8: 115,301 ns/op
# Warmup Iteration 9: 115,452 ns/op
# Warmup Iteration 10: 124,493 ns/op <----
Iteration 1: 123,776 ns/op
Iteration 2: 123,719 ns/op
Iteration 3: 123,725 ns/op
Iteration 4: 126,892 ns/op
Iteration 5: 125,493 ns/op
Iteration 6: 124,661 ns/op
Iteration 7: 123,783 ns/op
Iteration 8: 123,975 ns/op
Iteration 9: 124,047 ns/op
Iteration 10: 123,899 ns/op
length = 100
# Run progress: 11,11% complete, ETA 00:20:10
# Fork: 1 of 5
# Warmup Iteration 1: 1236,695 ns/op
# Warmup Iteration 2: 1069,909 ns/op
# Warmup Iteration 3: 1243,852 ns/op
# Warmup Iteration 4: 1059,038 ns/op <----
# Warmup Iteration 5: 1057,670 ns/op
# Warmup Iteration 6: 1057,117 ns/op
# Warmup Iteration 7: 1056,932 ns/op
# Warmup Iteration 8: 1054,909 ns/op
# Warmup Iteration 9: 1058,106 ns/op
# Warmup Iteration 10: 1145,699 ns/op <----
Iteration 1: 1139,155 ns/op
Iteration 2: 1141,662 ns/op
Iteration 3: 1139,061 ns/op
Iteration 4: 1139,306 ns/op
Iteration 5: 1138,475 ns/op
Iteration 6: 1140,491 ns/op
Iteration 7: 1144,017 ns/op
Iteration 8: 1139,411 ns/op
Iteration 9: 1142,729 ns/op
Iteration 10: 1144,042 ns/op
Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at
iteration 4 the code is top-optimized,
and later at the end of warm-up a kind of deoptimization happens. There's still
visible improvement however.
The benchmark is run with 5 forks, and this deoptimization is visible for
length = {10, 100}.
So my two question are:
- What is the reason of the deoptimization and can we do something about it?
- Whether suggested changes is worth implementations?
Regards,
Sergey Tsypanov
1.
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
public class HashSetBenchmark {
@Benchmark
public HashSet<Integer> fromArrayList(Data data) {
return new HashSet<>(data.arrayList);
}
@Benchmark
public HashSet<Integer> fromHashSet(Data data) {
return new HashSet<>(data.hashSet);
}
@Benchmark
public HashSet<Integer> fromArraysAsList(Data data) {
return new HashSet<>(data.arraysAsList);
}
@State(Scope.Thread)
public static class Data {
private ArrayList<Integer> arrayList;
private HashSet<Integer> hashSet;
private List<Integer> arraysAsList;
@Param({"10", "100", "1000"})
private int length;
@Setup
public void setup() throws IOException {
var array = new Integer[length];
for (int i = 0; i < length; i++) {
array[i] = i;
}
arraysAsList = Arrays.asList(array);
arrayList = new ArrayList<>(arraysAsList);
hashSet = new HashSet<>(arraysAsList);
}
}
}
2.
baseline
Benchmark (length) Mode Cnt Score Error
Units
HashSetBenchmark.fromArrayList 10 avgt 50 127,838 ± 1,822
ns/op
HashSetBenchmark.fromArrayList 100 avgt 50 1209,543 ± 8,435
ns/op
HashSetBenchmark.fromArrayList 1000 avgt 50 11915,334 ± 155,908
ns/op
HashSetBenchmark.fromArraysAsList 10 avgt 50 130,049 ± 1,661
ns/op
HashSetBenchmark.fromArraysAsList 100 avgt 50 1138,407 ± 19,998
ns/op
HashSetBenchmark.fromArraysAsList 1000 avgt 50 11345,099 ± 83,086
ns/op
HashSetBenchmark.fromHashSet 10 avgt 50 184,559 ± 1,387
ns/op
HashSetBenchmark.fromHashSet 100 avgt 50 2001,577 ± 42,321
ns/op
HashSetBenchmark.fromHashSet 1000 avgt 50 17535,017 ± 199,892
ns/op
patched
Benchmark (length) Mode Cnt Score Error
Units
HashSetBenchmark.fromArrayList 10 avgt 50 125,075 ± 0,907
ns/op
HashSetBenchmark.fromArrayList 100 avgt 50 1127,378 ± 19,368
ns/op
HashSetBenchmark.fromArrayList 1000 avgt 50 11455,172 ± 61,543
ns/op
HashSetBenchmark.fromArraysAsList 10 avgt 50 117,634 ± 3,641
ns/op
HashSetBenchmark.fromArraysAsList 100 avgt 50 1034,538 ± 11,958
ns/op
HashSetBenchmark.fromArraysAsList 1000 avgt 50 10786,914 ± 137,330
ns/op
HashSetBenchmark.fromHashSet 10 avgt 50 173,727 ± 2,177
ns/op
HashSetBenchmark.fromHashSet 100 avgt 50 1748,858 ± 12,535
ns/op
HashSetBenchmark.fromHashSet 1000 avgt 50 16474,208 ± 97,578
ns/op