On Wed, 28 Aug 2024 15:32:40 GMT, Francesco Nigro <d...@openjdk.org> wrote:
>>> How fast do we need to be here given we are measuring in a few nanoseconds >>> per operation? >>> >>> What if the goal is not to regress from say explicitly filling in a small >>> sized segment or a comparable array (e.g., < 8 bytes) then maybe a loop >>> suffices and the code is simple? >> >> Fair question. I have another version (called "patch bits" below) that is >> based on bit logic (first doing int ops, then short and lastly byte, similar >> to `ArraySupport::vectorizedMismatch`). This has slightly worse performance >> but is more scalable and perhaps simpler. >> >>  > > @minborg Hi! I didn't checked the numbers with the benchmark I've written at > https://github.com/openjdk/jdk/pull/20712#discussion_r1732802685 which is > meant to stress the branch predictor (without enough `samples` i.e. past 128K > on my machine) - can you give it a shot with M1 🙏 ? @franz1981 Here is what I get if I run your performance test on my M1 Mac (unfortunately no -perf data): Benchmark (samples) (shuffle) Mode Cnt Score Error Units TestBranchFill.heap_segment_fill 1024 false avgt 30 3695.815 ? 24.615 ns/op TestBranchFill.heap_segment_fill 1024 true avgt 30 3938.582 ? 124.510 ns/op TestBranchFill.heap_segment_fill 128000 false avgt 30 420845.301 ? 1605.080 ns/op TestBranchFill.heap_segment_fill 128000 true avgt 30 1778362.506 ? 39250.756 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/20712#issuecomment-2321048180