On Fri, 18 Oct 2024 05:37:16 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2122:
>> 
>>> 2120:     // MulL (URShift SRC1 , 32) (URShift SRC2, 32)
>>> 2121:     // MulL (URShift SRC1 , 32)  ( And  SRC2,  0xFFFFFFFF)
>>> 2122:     // MulL ( And  SRC1,  0xFFFFFFFF) (URShift SRC2 , 32)
>> 
>> I don't understand how it works... According to the documentation, 
>> `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of 
>> quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs 
>> (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands 
>> and pass them into `vpmuludq` which doesn't look right...
>
> `vpmuludq` does a long multiplication but throws away the upper bits of the 
> operands, effectively does a `(x & max_juint) * (y & max_juint)`

You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq

    VPMULUDQ (VEX.256 Encoded Version)[ 
ΒΆ](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-)
    DEST[63:0] := SRC1[31:0] * SRC2[31:0]
    DEST[127:64] := SRC1[95:64] * SRC2[95:64]
    DEST[191:128] := SRC1[159:128] * SRC2[159:128]
    DEST[255:192] := SRC1[223:192] * SRC2[223:192]
    DEST[MAXVL-1:256] := 0

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805888984

Reply via email to