On Fri, 30 Aug 2024 14:15:24 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:

>> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
>>  line 208:
>> 
>>> 206:             }
>>> 207:             final long u = Byte.toUnsignedLong(value);
>>> 208:             final long longValue = u << 56 | u << 48 | u << 40 | u << 
>>> 32 | u << 24 | u << 16 | u << 8 | u;
>> 
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure 
>> if fast(er), need to measure.
>> 
>> Most of the time filling is happy with 0 since zeroing is the most common 
>> case
>
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure 
>> if fast(er), need to measure.
>> 
>> Most of the time filling is happy with 0 since zeroing is the most common 
>> case
> 
> It's a clever trick. However, I was looking at similar tricks and found that 
> the time spent here is irrelevant (e.g. I tried to always force `0` as the 
> value, and couldn't see any difference).

If I run:


    @Benchmark
    public long shift() {
        return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE 
<< 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE;
    }

    @Benchmark
    public long mul() {
        return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
    }

Then I get:

Benchmark       (ELEM_SIZE)  Mode  Cnt  Score   Error  Units
TestFill.mul             31  avgt   30  0.586 ? 0.045  ns/op
TestFill.shift           31  avgt   30  0.938 ? 0.017  ns/op

On my M1 machine.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1740564110

Reply via email to