On Tue, 8 Jul 2025 10:33:50 GMT, Fei Gao <[email protected]> wrote:
>>> > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>>> > > >
>>> > > >
>>> > > > Actually I didn't change the min vector size for `char` vectors in
>>> > > > this patch. Relaxing `short` vectors to 32-bit is to support the
>>> > > > vector cast for Vector API, and there is no `char` species in it. Do
>>> > > > you think it's better to do the same change for `char` as well? This
>>> > > > will just benefit auto-vectorization.
>>> > >
>>> > >
>>> > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases
>>> > > involving `char`, the vector elements are represented using `T_SHORT`
>>> > > as the `BasicType`, rather than `T_CHAR`.
>>> > > This is because, in Java, operands of subword types are always promoted
>>> > > to `int` before any arithmetic operation. As a result, when handling a
>>> > > node like `ConvD2I`, we don’t initially know its actual subword type.
>>> > > Later, the SuperWord phase propagates a narrowed integer type backward
>>> > > to help determine the correct subword type. See:
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>>> > >
>>> > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>>> > >
>>> > > the entire propagation chain tends to use `T_SHORT` as well.
>>> > > This applies to most operations, with the exception of a few like
>>> > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>>> > > So your change already benefits many char-related vectorization cases
>>> > > like `convertDoubleToChar` above. That’s why we can safely relax the IR
>>> > > condition mentioned earlier.
>>> >
>>> >
>>> > Thanks for your input! It's really helpful to me. Does this mean it
>>> > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do
>>> > not need to consider `T_CHAR` in vector IRs in backend?
>>>
>>> No, we don't always use `T_SHORT` for char vectors. As mentioned earlier,
>>> for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler
>>> needs to preserve the higher-order bits of the first operand. Therefore,
>>> SuperWord still needs to assign them precise subword types. See:
>>>
>>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>>
>> Yes, I see. Thanks! What I mean is for cases that SLP will use the sub...
>
>> > > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>> > > > >
>> > > > >
>> > > > > Actually I didn't change the min vector size for `char` vectors in
>> > > > > this patch. Relaxing `short` vectors to 32-bit is to support the
>> > > > > vector cast for Vector API, and there is no `char` species in it. Do
>> > > > > you think it's better to do the same change for `char` as well? This
>> > > > > will just benefit auto-vectorization.
>> > > >
>> > > >
>> > > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases
>> > > > involving `char`, the vector elements are represented using `T_SHORT`
>> > > > as the `BasicType`, rather than `T_CHAR`.
>> > > > This is because, in Java, operands of subword types are always
>> > > > promoted to `int` before any arithmetic operation. As a result, when
>> > > > handling a node like `ConvD2I`, we don’t initially know its actual
>> > > > subword type. Later, the SuperWord phase propagates a narrowed integer
>> > > > type backward to help determine the correct subword type. See:
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>> > > >
>> > > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>> > > >
>> > > > the entire propagation chain tends to use `T_SHORT` as well.
>> > > > This applies to most operations, with the exception of a few like
>> > > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>> > > > So your change already benefits many char-related vectorization cases
>> > > > like `convertDoubleToChar` above. That’s why we can safely relax the
>> > > > IR condition mentioned earlier.
>> > >
>> > >
>> > > Thanks for your input! It's really helpful to me. Does this mean it
>> > > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we
>> > > do not need to consider `T_CHAR` in vector IRs in backend?
>> >
>> >
>> > No, we don't always use `T_SHORT` for char vectors. As mentioned earlier,
>> > for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler
>> > needs to preserve the higher-order bits of the first operand. Therefore,
>> > SuperWord still needs to assign them precise subword types. See:
>> > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>>
>> Yes, I see. Thanks! What I mean is for cases th...
@fg1417 , there is performance regression of `D -> S` on NEON for SLP. I'v
disabled the case in latest change. And here is the performance data of JMH
`TypeVectorOperations` on Grace (the 128-bit SVE machine) and N1 (NEON)
respectively:
Grace:
Benchmark COUNT Mode Unit Before After
Ratio
TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 155.667433
123.222497 1.26
TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 622.262384
489.336020 1.27
TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 93.173939
63.557134 1.46
TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 365.287938
239.726941 1.52
TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 157.096344
147.560047 1.06
TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 627.039963
614.748559 1.01
TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 111.752970
108.629240 1.02
TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 441.312737
441.088523 1.00
N1:
Benchmark COUNT Mode Unit Before
After Ratio
TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 215.353528
214.769884 1.00
TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 958.428871
952.922855 1.00
TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 158.000190
142.647209 1.10
TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 612.525835
532.023419 1.15
TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 209.993363
210.466401 0.99
TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 819.181052
803.601170 1.01
TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 217.848273
182.680450 1.19
TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 858.031089
695.502377 1.23
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3050738693