On Wed, 28 Aug 2024 09:06:48 GMT, Per Minborg <pminb...@openjdk.org> wrote:
>> How fast do we need to be here given we are measuring in a few nanoseconds >> per operation? >> >> What if the goal is not to regress from say explicitly filling in a small >> sized segment or a comparable array (e.g., < 8 bytes) then maybe a loop >> suffices and the code is simple? > >> How fast do we need to be here given we are measuring in a few nanoseconds >> per operation? >> >> What if the goal is not to regress from say explicitly filling in a small >> sized segment or a comparable array (e.g., < 8 bytes) then maybe a loop >> suffices and the code is simple? > > Fair question. I have another version (called "patch bits" below) that is > based on bit logic (first doing int ops, then short and lastly byte, similar > to `ArraySupport::vectorizedMismatch`). This has slightly worse performance > but is more scalable and perhaps simpler. > >  @minborg Hi! I didn't checked the numbers with the benchmark I've written at https://github.com/openjdk/jdk/pull/20712#discussion_r1732802685 which is meant to stress the branch predictor (without enough `samples` i.e. past 128K on my machine) - can you give it a shot with M1 🙏 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20712#issuecomment-2315685287