On Fri, 17 Jan 2025 14:58:37 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
> Could you add the benchmark you're using to the PR as well? Done. I slotted it into the "points" BM suite, alas I had to define another "DoublePoint" struct, though, since the existing int/int pair gets packed into a long. Full disclosure, I'm not sure how to run it inside the jdk build structure, ran it outside instead, so I hope it builds (`make test TEST="micro:java.lang.foreign.points"` => `Error: Unable to access jarfile /Users/mernst/IdeaProjects/jdk/build/macosx-aarch64-server-fastdebug/images/test/micro/benchmarks.jar`) It exercises a loop like this: struct DoublePoint { double x; double y; } DoublePoint unit_rotate(double phi); <== HFA requires intermediate buffer void unit_rotate_ptr(DoublePoint* out, double phi); <== reference, no intermediate buffer DoublePoint *points = new DoublePoint[N]; for (i in 0...N) points[i] = unit_rotate(2*pi*i/N); vs for (i in 0...N) unit_rotate_ptr(points+i, 2*pi*i/N); It is now almost competitive and the memory profile looks a lot better: # VM version: JDK 25-ea, OpenJDK 64-Bit Server VM, 25-ea+3-283 Benchmark Mode Cnt Score Error Units PointsAlloc.circle_by_ptr avgt 5 8.964 ± 0.351 ns/op PointsAlloc.circle_by_ptr:·gc.alloc.rate avgt 5 95.301 ± 3.665 MB/sec PointsAlloc.circle_by_ptr:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001 B/op PointsAlloc.circle_by_ptr:·gc.count avgt 5 2.000 counts PointsAlloc.circle_by_ptr:·gc.time avgt 5 3.000 ms PointsAlloc.circle_by_value avgt 5 46.498 ± 2.336 ns/op PointsAlloc.circle_by_value:·gc.alloc.rate avgt 5 13141.578 ± 650.425 MB/sec PointsAlloc.circle_by_value:·gc.alloc.rate.norm avgt 5 160.224 ± 0.001 B/op PointsAlloc.circle_by_value:·gc.count avgt 5 116.000 counts PointsAlloc.circle_by_value:·gc.time avgt 5 44.000 ms # VM version: JDK 25-internal, OpenJDK 64-Bit Server VM, 25-internal-adhoc.mernst.jdk Benchmark Mode Cnt Score Error Units PointsAlloc.circle_by_ptr avgt 5 9.108 ± 0.477 ns/op PointsAlloc.circle_by_ptr:·gc.alloc.rate avgt 5 93.792 ± 4.898 MB/sec PointsAlloc.circle_by_ptr:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001 B/op PointsAlloc.circle_by_ptr:·gc.count avgt 5 2.000 counts PointsAlloc.circle_by_ptr:·gc.time avgt 5 4.000 ms PointsAlloc.circle_by_value avgt 5 13.180 ± 0.611 ns/op PointsAlloc.circle_by_value:·gc.alloc.rate avgt 5 64.816 ± 2.964 MB/sec PointsAlloc.circle_by_value:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001 B/op PointsAlloc.circle_by_value:·gc.count avgt 5 2.000 counts PointsAlloc.circle_by_value:·gc.time avgt 5 5.000 ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2599586149