Re: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3]

Maurizio Cimadamore Mon, 15 Jul 2024 05:03:28 -0700

On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jver...@openjdk.org> wrote:


> I've update the benchmark to run with 3 separate threads: 1 thread that is 
> just creating and closing shared arenas in a loop, 1 that is accessing memory 
> using the FFM API, and 1 that is accessing a `byte[]`.
> 
> Current:
> 
> ```
> Benchmark                                        Mode  Cnt   Score    Error  
> Units
> ConcurrentClose.sharedClose                      avgt   10  50.093 ±  6.200  
> us/op
> ConcurrentClose.sharedClose:closing              avgt   10  46.269 ±  0.786  
> us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  98.072 ± 19.061  
> us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.938 ±  0.058  
> us/op
> ```
> 
> I do see a pretty big difference on the memory segment accessing thread when 
> I remove deoptimization altogether:
> 
> ```
> Benchmark                                        Mode  Cnt   Score   Error  
> Units
> ConcurrentClose.sharedClose                      avgt   10  22.664 ± 0.409  
> us/op
> ConcurrentClose.sharedClose:closing              avgt   10  45.351 ± 1.554  
> us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  16.671 ± 0.251  
> us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.969 ± 0.089  
> us/op
> ```
> 
> When I remove the `has_scoped_access()` check before the deopt, I expect the 
> `otherAccess` thread to be affected, but the effect isn't nearly as big as 
> with the FFM thread. I think this is likely due to the `otherAccess` 
> benchmark being less sensitive to optimization (i.e. it already runs fairly 
> fast in the interpreter). I also tried using 
> `MethodHandles::arrayElementGetter` for the access, but the numbers I got 
> were pretty much the same:
> 
> ```
> Benchmark                                        Mode  Cnt    Score   Error  
> Units
> ConcurrentClose.sharedClose                      avgt   10   52.745 ± 1.071  
> us/op
> ConcurrentClose.sharedClose:closing              avgt   10   46.670 ± 0.453  
> us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  102.663 ± 3.430  
> us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10    8.901 ± 0.109  
> us/op
> ```
> 
> I think, to really test the effect of the `has_scoped_access` check, we need 
> to look at a more realistic scenario.

Interesting benchmark. What is the baseline here? E.g. can we also compare 
against same benchmark that is using a confined arena to do the closing?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857

Re: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3]

Reply via email to