Re: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9]

Jatin Bhateja Fri, 17 Jan 2025 08:04:34 -0800

On Wed, 15 Jan 2025 00:28:50 GMT, Paul Sandoz <psan...@openjdk.org> wrote:


>> Hi @PaulSandoz ,
>> 
>> In  above code snippet the return type 'short' of intrinsic call does not 
>> comply with the value being returned which is of box type, thereby mandating 
>> addition glue code. 
>> 
>> Regular primitive type boxing APIs are lazily intrinsified, thereby 
>> generating  an intrinsifiable Call IR during parsing. 
>> LoadNode’s idealization can fetch a boxed value from the input of boxing 
>> call IR and directly forward it to users.
>> 
>> Q1. What is the problem in directly passing Float16 boxes to FMA and SQRT 
>> intrinsic entry points?
>> 
>> A. The compiler will have to unbox them before the actual operation. There 
>> are multiple schemes to perform unboxing, such as name-based, offset-based, 
>> and index-based field lookup. 
>> Vector API unbox expansion uses an offset-based payload field lookup, for 
>> this it bookkeeps the payload’s offset over runtime representation of 
>> VectorPayload class created as part of VM initialization.
>> However, VM can only bookkeep this information for classes that are part of 
>> java.base module, Float16 being part of incubation module cannot use 
>> offset-based field lookup.  Thus only viable alternative is to unbox using 
>> field name/index based lookup.
>> For this compiler will first verify that the incoming oop is of Float16 type 
>> and then use a hardcoded name-based lookup to Load the field value. This 
>> looks fragile as it establishes an unwanted dependency b/w Float16 field 
>> names and compiler implementation, same applies to index-based lookup as 
>> index values are dependent onthe combined field count of class and 
>> instance-specific fields, thus any addition or deletion of a class-level 
>> static helper field before the field of interest can invalidate any 
>> hardcoded index value used by the compiler. 
>> All in all, for safe and reliable unboxing by compiler, it's necessary to 
>> create an upfront VM representation like vector_VectorPayload.
>> 
>> Q2. What are the pros and cons of passing both the unboxed value and boxed 
>> values to the intrinsic entry point?
>> A.  
>> Pros:
>> - This will save unsafe unboxing implementation if the holder class is not 
>> part of java.base module.
>> - We can leverage existing box intrinsification infrastructure which 
>> directly forwards the embedded values to its users.
>> - Also, it will minimize the changes in the Java side implementation.
>> 
>> Cons:
>> - It's suboptimal in case the call is neither intrinsified or inlined, as it 
>> will add additional spills before the call.
>> 
>> Q3.  Primitive box class boxing API “valueOf” accepts...
>
>> Given the above observations passing 3 additional box arguments to intrinsic 
>> and returning a box value needs additional changes in the compiler while 
>> minor re-structuring in Java implementation packed with in the glue logic 
>> looks like a reasonable approach.
> 
> Did you mean to say *no* additional changes in the compiler? Otherwise, if 
> not what would those changes be?

Hi @PaulSandoz ,

Many thanks for your suggestion. From a compiler standpoint, changes are mainly 
around unboxing/boxing arguments and return values. Since the _Float16_ class 
is defined in the incubation module, we cannot refer to it in the wrapper class 
_Float16Math_ which is part of _java.base_ module. Thus, lambda must pass the 
boxes as _java.lang.Object_ type arguments will introduce additional type 
sharpening casts in lambda and defeat our purpose of preserving unmodified code 
in lambda. The current scheme only deals in intrinsic with short parameters and 
return values and is free from these concerns.

The user may declare MUL and ADD as separate operations, and to honor JLS 
specification and precise floating-point model semantics we should not infer an 
FMA scalar operation, thus, intrinsification is appropriate here rather than a 
pattern match.

With this incremental commit, I am trying to minimize the Java side changes. 
Let me know if this looks fine to you.

Verified that auto-vectorization planned for follow-on patch is aligned with 
new changes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1920398466

Re: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9]

Reply via email to