On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
> I've update the benchmark to run with 3 separate threads: 1 thread that is > just creating and closing shared arenas in a loop, 1 that is accessing memory > using the FFM API, and 1 that is accessing a `byte[]`. > > Current: > > ``` > Benchmark Mode Cnt Score Error > Units > ConcurrentClose.sharedClose avgt 10 50.093 ± 6.200 > us/op > ConcurrentClose.sharedClose:closing avgt 10 46.269 ± 0.786 > us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 98.072 ± 19.061 > us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 5.938 ± 0.058 > us/op > ``` > > I do see a pretty big difference on the memory segment accessing thread when > I remove deoptimization altogether: > > ``` > Benchmark Mode Cnt Score Error > Units > ConcurrentClose.sharedClose avgt 10 22.664 ± 0.409 > us/op > ConcurrentClose.sharedClose:closing avgt 10 45.351 ± 1.554 > us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 16.671 ± 0.251 > us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 5.969 ± 0.089 > us/op > ``` > > When I remove the `has_scoped_access()` check before the deopt, I expect the > `otherAccess` thread to be affected, but the effect isn't nearly as big as > with the FFM thread. I think this is likely due to the `otherAccess` > benchmark being less sensitive to optimization (i.e. it already runs fairly > fast in the interpreter). I also tried using > `MethodHandles::arrayElementGetter` for the access, but the numbers I got > were pretty much the same: > > ``` > Benchmark Mode Cnt Score Error > Units > ConcurrentClose.sharedClose avgt 10 52.745 ± 1.071 > us/op > ConcurrentClose.sharedClose:closing avgt 10 46.670 ± 0.453 > us/op > ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 102.663 ± 3.430 > us/op > ConcurrentClose.sharedClose:otherAccess avgt 10 8.901 ± 0.109 > us/op > ``` > > I think, to really test the effect of the `has_scoped_access` check, we need > to look at a more realistic scenario. Interesting benchmark. What is the baseline here? E.g. can we also compare against same benchmark that is using a confined arena to do the closing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857