On Sun, 15 Dec 2024 18:05:02 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
> Hi All, > > This patch adds C2 compiler support for various Float16 operations added by > [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or > pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred > through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and > their corresponding entry points are defined in the newly added Float16Math > class. > - These intrinsics receive unwrapped short arguments encoding IEEE > 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, > and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please > refer to [FAQs > ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more > details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are > always loaded into general purpose register, but FP16 ISA instructions > generally operate over floating point registers, therefore compiler injectes > reinterpretation IR before and after Float16 operation nodes to move short > value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. > HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > Kindly review and share your feedback. > > Best Regards, > Jatin Some FAQs on the newly added ideal type for half-float IR nodes:- Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type? A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register. Q. Problem with ConF? A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width. All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2543982577