On Fri, 30 Aug 2024 14:15:24 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java >> line 208: >> >>> 206: } >>> 207: final long u = Byte.toUnsignedLong(value); >>> 208: final long longValue = u << 56 | u << 48 | u << 40 | u << >>> 32 | u << 24 | u << 16 | u << 8 | u; >> >> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure >> if fast(er), need to measure. >> >> Most of the time filling is happy with 0 since zeroing is the most common >> case > >> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure >> if fast(er), need to measure. >> >> Most of the time filling is happy with 0 since zeroing is the most common >> case > > It's a clever trick. However, I was looking at similar tricks and found that > the time spent here is irrelevant (e.g. I tried to always force `0` as the > value, and couldn't see any difference). If I run: @Benchmark public long shift() { return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE << 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE; } @Benchmark public long mul() { return ELEM_SIZE * 0xFFFF_FFFF_FFFFL; } Then I get: Benchmark (ELEM_SIZE) Mode Cnt Score Error Units TestFill.mul 31 avgt 30 0.586 ? 0.045 ns/op TestFill.shift 31 avgt 30 0.938 ? 0.017 ns/op On my M1 machine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1740564110