I guess that the cost for an array element access is more expensive than
accessing a field, regardless of volatile. The offset is computed at
runtime for an array element and might need to be checked to be in the
range of valid indices, while the offset of a field is known and can be
a constant in the instruction stream.
Different access costs probably prevent a direct comparison of the
ratios non-hoisted/hoisted.
Also, you would probably have to look at the generated code.
For example, in the hoisted field case the loop for the sum could be
replaced by a multiplication value * count.
On 2023-08-16 16:06, Сергей Цыпанов wrote:
I meant relation between in-loop and hoisted access.
In Java 19 when we take count = 100 for volatile array and hoist it from the loop then the average time decreases from 146 to 33 ns.
And if we take the same count for "plain" field and hoist it from the loop then the average time decreases from 98 to 7 ns.
If I read the data correctly, for the count=100 case in jdk 20 it takes
109 ns/op for the array and 74 ns/op for the field.
To me this looks like a field access is _less_ expensive.
Am I missing something?
On 2023-08-16 13:37, Сергей Цыпанов wrote:
Hello,
I was measuring costs of hoisting volatile access out of the loop and found out, that
there's a difference in numbers for arrays and "plain" references.
Here's the benchmark for array:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(time = 2, iterations = 5)
@Measurement(time = 2, iterations = 5)
@Fork(value = 4, jvmArgs = "-Xmx1g")
public class VolatileArrayInLoopBenchmark {
@Benchmark
public int accessVolatileInLoop(Data data) {
int sum = 0;
for (int i = 0; i < data.count; i++) {
sum += data.ints[i];
}
return sum;
}
@Benchmark
public int hoistVolatileFromLoop(Data data) {
int sum = 0;
int[] ints = data.ints;
for (int i = 0; i < data.count; i++) {
sum += ints[i];
}
return sum;
}
@State(Scope.Benchmark)
public static class Data {
@Param({"1", "10", "100"})
private int count;
private volatile int[] ints;
@Setup
public void setUp() {
int[] ints = new int[count];
for (int i = 0; i < ints.length; i++) {
ints[i] = ThreadLocalRandom.current().nextInt();
}
this.ints = ints;
}
}
}
and this one is for reference:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(time = 2, iterations = 5)
@Measurement(time = 2, iterations = 5)
@Fork(value = 4, jvmArgs = "-Xmx1g")
public class VolatileFieldInLoopBenchmark {
@Benchmark
public int accessVolatileInLoop(Data data) {
int sum = 0;
for (int i = 0; i < data.count; i++) {
sum += data.value;
}
return sum;
}
@Benchmark
public int hoistVolatileFromLoop(Data data) {
int sum = 0;
int value = data.value;
for (int i = 0; i < data.count; i++) {
sum += value;
}
return sum;
}
@State(Scope.Benchmark)
public static class Data {
private final ThreadLocalRandom random = ThreadLocalRandom.current();
private volatile int value = random.nextInt();
@Param({"1", "10", "100"})
private int count;
}
}
From measurement results it looks like volatile array access is cheaper than
"plain" reference access:
Java 19
Benchmark (count) Mode Cnt Score Error Units
VolatileArrayInLoopBenchmark.accessVolatileInLoop 1 avgt 20 2.110 ± 0.404 ns/op
VolatileArrayInLoopBenchmark.accessVolatileInLoop 10 avgt 20 14.836 ± 2.825
ns/op
VolatileArrayInLoopBenchmark.accessVolatileInLoop 100 avgt 20 146.497 ± 25.786
ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 1 avgt 20 3.006 ± 0.686 ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 10 avgt 20 6.222 ± 1.215
ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 100 avgt 20 33.262 ± 6.579
ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 1 avgt 20 1.823 ± 0.382 ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 10 avgt 20 10.259 ± 2.874
ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 100 avgt 20 98.648 ± 18.500
ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 1 avgt 20 2.189 ± 0.412 ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 10 avgt 20 4.734 ± 0.891
ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 100 avgt 20 7.126 ± 1.309
ns/op
Java 20
Benchmark (count) Mode Cnt Score Error Units
VolatileArrayInLoopBenchmark.accessVolatileInLoop 1 avgt 20 1.714 ± 0.066 ns/op
VolatileArrayInLoopBenchmark.accessVolatileInLoop 10 avgt 20 10.703 ± 0.148
ns/op
VolatileArrayInLoopBenchmark.accessVolatileInLoop 100 avgt 20 109.001 ± 1.866
ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 1 avgt 20 2.408 ± 0.224 ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 10 avgt 20 4.678 ± 0.060
ns/op
VolatileArrayInLoopBenchmark.hoistVolatileFromLoop 100 avgt 20 24.711 ± 1.091
ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 1 avgt 20 1.366 ± 0.105 ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 10 avgt 20 7.388 ± 0.119 ns/op
VolatileFieldInLoopBenchmark.accessVolatileInLoop 100 avgt 20 74.630 ± 1.163
ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 1 avgt 20 1.653 ± 0.035 ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 10 avgt 20 3.138 ± 0.040
ns/op
VolatileFieldInLoopBenchmark.hoistVolatileFromLoop 100 avgt 20 4.945 ± 0.177
ns/op
So my question is why is volatile reference access is relatively more expensive
than volatile array access?