On Tue, 10 Dec 2024 07:44:57 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Quan Anh Mai has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Change wording on VectorLoadShuffleNode
>>   
>>   Co-authored-by: Jatin Bhateja <jatin.bhat...@intel.com>
>
> src/hotspot/share/opto/library_call.hpp line 358:
> 
>> 356:   bool inline_vector_shuffle_to_vector();
>> 357:   bool inline_vector_wrap_shuffle_indexes();
>> 358:   bool inline_vector_shuffle_iota();
> 
> FTR, x86 ISA does not support a direct byte multiplier instruction, so we 
> first unpack to a short vector, multiply at a short granularity, and then 
> pack it back to byte vector. This was somewhat costly since now shuffle 
> backing storage matches the lane size of the corresponding vector. Hence, the 
> perofmance of iota computation with a non-unit scalar should improve.

I believe with the type information of vector elements this optimization should 
be trivial.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877566967

Reply via email to