On Tue, 10 Dec 2024 07:44:57 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Quan Anh Mai has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Change wording on VectorLoadShuffleNode >> >> Co-authored-by: Jatin Bhateja <jatin.bhat...@intel.com> > > src/hotspot/share/opto/library_call.hpp line 358: > >> 356: bool inline_vector_shuffle_to_vector(); >> 357: bool inline_vector_wrap_shuffle_indexes(); >> 358: bool inline_vector_shuffle_iota(); > > FTR, x86 ISA does not support a direct byte multiplier instruction, so we > first unpack to a short vector, multiply at a short granularity, and then > pack it back to byte vector. This was somewhat costly since now shuffle > backing storage matches the lane size of the corresponding vector. Hence, the > perofmance of iota computation with a non-unit scalar should improve. I believe with the type information of vector elements this optimization should be trivial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877566967