On Mon, 26 Aug 2024 14:25:31 GMT, Per Minborg <pminb...@openjdk.org> wrote:
>> It is true, that this is a compromise where we give up inline space, >> code-cache space, and introduce added complexity against the prospect of >> better small-size performance. Depending on the workload, this may or may >> not pay off. In the (presumably common) case where we allocate/fill small >> segments of constant sizes, this is likely a win. Writing a dynamic >> performance test sounds like a good idea. > > Here is a benchmark that fills segments of various random sizes: > > > > @BenchmarkMode(Mode.AverageTime) > @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS) > @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > @State(Scope.Thread) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(value = 3) > public class TestFill { > > private static final int SIZE = 16; > private static final int[] INDICES = new Random(42).ints(0, 8) > .limit(SIZE) > .toArray(); > > > private MemorySegment[] segments; > > @Setup > public void setup() { > segments = IntStream.of(INDICES) > .mapToObj(i -> MemorySegment.ofArray(new byte[i])) > .toArray(MemorySegment[]::new); > } > > @Benchmark > public void heap_segment_fill() { > for (int i = 0; i < SIZE; i++) { > segments[i].fill((byte) 0); > } > } > > } > > > This produces the following on my Mac M1: > > > Benchmark Mode Cnt Score Error Units > TestFill.heap_segment_fill avgt 30 59.054 ? 3.723 ns/op > > > On average, an operation will take 59/16 = ~3 ns per operation (including > looping). > > A test with the same size for every benchmark looks like this on my machine: > > > Benchmark (ELEM_SIZE) Mode Cnt Score Error Units > TestFill.heap_segment_fill 0 avgt 30 1.112 ? 0.027 ns/op > TestFill.heap_segment_fill 1 avgt 30 1.602 ? 0.060 ns/op > TestFill.heap_segment_fill 2 avgt 30 1.583 ? 0.004 ns/op > TestFill.heap_segment_fill 3 avgt 30 1.909 ? 0.055 ns/op > TestFill.heap_segment_fill 4 avgt 30 1.605 ? 0.059 ns/op > TestFill.heap_segment_fill 5 avgt 30 1.900 ? 0.064 ns/op > TestFill.heap_segment_fill 6 avgt 30 1.891 ? 0.038 ns/op > TestFill.heap_segment_fill 7 avgt 30 2.237 ? 0.091 ns/op As discussed offline, can't we use a stable array of functions or something like that which can be populated lazily? That way you can access the function you want in a single array access, and we could put all these helper methods somewhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1731855496