On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

> Hi All,
> 
> This patch adds C2 compiler support for various Float16 operations added by 
> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
> 
> Following is the summary of changes included with this patch:-
> 
> 1. Detection of various Float16 operations through inline expansion or 
> pattern folding idealizations.
> 2. Float16 operations like add, sub, mul, div, max, and min are inferred 
> through pattern folding idealization.
> 3. Float16 SQRT and FMA operation are inferred through inline expansion and 
> their corresponding entry points are defined in the newly added Float16Math 
> class.
>       -    These intrinsics receive unwrapped short arguments encoding IEEE 
> 754 binary16 values.
> 5. New specialized IR nodes for Float16 operations, associated idealizations, 
> and constant folding routines.
> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please 
> refer to [FAQs 
> ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more 
> details.
> 7. Since Float16 uses short as its storage type, hence raw FP16 values are 
> always loaded into general purpose register, but FP16 ISA instructions 
> generally operate over floating point registers, therefore compiler injectes 
> reinterpretation IR before and after Float16 operation nodes to move short 
> value to floating point register and vice versa.
> 8. New idealization routines to optimize redundant reinterpretation chains. 
> HF2S + S2HF = HF
> 6. Auto-vectorization of newly supported scalar operations.
> 7. X86 and AARCH64 backend implementation for all supported intrinsics.
> 9. Functional and Performance validation tests.
> 
> **Missing Pieces:-**
> **-  AARCH64 Backend.**
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

As I noted on Joe's PR, I like the fact that the intrinsics are decoupled from 
the box class.

I'm now wondering if there is another simplification possible (as I claimed to 
Joe!) which is to reduce the number of intrinsics, ideally down to conversions 
(to and from HF).

For example, `sqrt_float16` is an intrinsic, but I think it could be just an 
invisible IR node.  After inlining the Java definition, you start with an IR 
graph that mentions `sqrtD` and is surrounded by conversion nodes.   Then you 
refactor the IR graph to use `sqrt_float16` directly, presumably with fewer 
conversions (and/or reinterprets).

Same argument for max, min, add, mul, etc.

I'm not saying the current PR is wrong, but I would like to know if it could be 
simplified, either now or later.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2424373685

Reply via email to