Re: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5]

Emanuel Peter Mon, 12 Jan 2026 00:51:48 -0800

On Mon, 12 Jan 2026 08:13:04 GMT, Bhavana Kilambi <[email protected]> wrote:


>> test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java 
>> line 459:
>> 
>>> 457:         short result = (short) 0;
>>> 458:         for (int i = 0; i < LEN; i++) {
>>> 459:             result = 
>>> float16ToRawShortBits(add(shortBitsToFloat16(result), 
>>> shortBitsToFloat16(input1[i])));
>> 
>> Why all the conversions from and to `short` / `Float16`?
>> Is there any benefit to use `short` for the intermediate results? Why not 
>> make `result` a `Float16`?
>
> If I remember correctly, I tried doing that initially but the loop did not 
> get vectorized. The Ideal graph showed there were a lot of nodes related to 
> object creation (probably for the intermediate `Float16` result) which 
> bloated the size of the loop resulting in the loop not getting unrolled (and 
> eventually not vectorized). I also tried a standalone loop where I do not 
> return the intermediate result hoping that escape analysis could help in 
> avoiding the object creation but did not help either.

Hmm, I see. That sounds like a deficiency in the auto unboxing of Float16.

Suggestion: You should create both variants of the IR tests. And then file an 
RFE for the one that does not yet vectorize because of the boxing issues.

Because the way things are now, it's not a huge win, to be honest. Which user 
is supposed to write their code in such a convoluted way, having to cast back 
and forth? Would they not expect they could just use Float16 all the way 
through?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2681318247

Re: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5]

Reply via email to