On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> `vpmuludq` does a long multiplication but throws away the upper bits of the >> operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ (VEX.256 Encoded Version)[ > ΒΆ](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) > DEST[63:0] := SRC1[31:0] * SRC2[31:0] > DEST[127:64] := SRC1[95:64] * SRC2[95:64] > DEST[191:128] := SRC1[159:128] * SRC2[159:128] > DEST[255:192] := SRC1[223:192] * SRC2[223:192] > DEST[MAXVL-1:256] := 0 Got it. Now it makes perfect sense. Thanks for the clarifications! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805894106