On Wed, 28 Aug 2024 15:32:40 GMT, Francesco Nigro <d...@openjdk.org> wrote:

>>> How fast do we need to be here given we are measuring in a few nanoseconds 
>>> per operation?
>>> 
>>> What if the goal is not to regress from say explicitly filling in a small 
>>> sized segment or a comparable array (e.g., < 8 bytes) then maybe a loop 
>>> suffices and the code is simple?
>> 
>> Fair question. I have another version (called "patch bits" below) that is 
>> based on bit logic (first doing int ops, then short and lastly byte, similar 
>> to `ArraySupport::vectorizedMismatch`). This has slightly worse performance 
>> but is more scalable and perhaps simpler.
>> 
>> ![image](https://github.com/user-attachments/assets/292c75aa-0df8-4bb7-b45f-426d0f8470d9)
>
> @minborg Hi! I didn't checked the numbers with the benchmark I've written at 
> https://github.com/openjdk/jdk/pull/20712#discussion_r1732802685 which is 
> meant to stress the branch predictor (without enough `samples` i.e. past 128K 
> on my machine) - can you give it a shot with M1 🙏 ?

@franz1981 Here is what I get if I run your performance test on my M1 Mac 
(unfortunately no -perf data):


Benchmark                         (samples)  (shuffle)  Mode  Cnt        Score  
     Error  Units
TestBranchFill.heap_segment_fill       1024      false  avgt   30     3695.815 
?    24.615  ns/op
TestBranchFill.heap_segment_fill       1024       true  avgt   30     3938.582 
?   124.510  ns/op
TestBranchFill.heap_segment_fill     128000      false  avgt   30   420845.301 
?  1605.080  ns/op
TestBranchFill.heap_segment_fill     128000       true  avgt   30  1778362.506 
? 39250.756  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20712#issuecomment-2321048180

Reply via email to