tra added inline comments.
================ Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:604 def int_nvvm_f # operation # variant : ClangBuiltin<!strconcat("__nvvm_f", operation, variant)>, DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty], ---------------- tra wrote: > Availability of these new instructions is conditional on specific CUDA > version and the GPU variant we're compiling for, > Such builtins are normally implemented on the clang size as a > `TARGET_BUILTIN()` with appropriate constraints. > > Without that `ClangBuiltin` may automatically add enough glue to make them > available in clang unconditionally, which would result in compiler crashing > if a user tries to use one of those builtins with a wrong GPU or CUDA > version. We want to emit a diagnostics, not cause a compiler crash. > > Usually such related LLVM and clang changes should be part of the same patch. > > This applies to the new intrinsic variants added below, too. I do not think it's is done. Can you check what happens if you try to call any of bf16 builtins while compiling for sm_60? Ideally we should produce a sensible error that the builtin is not available. I suspect we will fail in LLVM when we'll fail to lower the intrinsic, ot in nvptx if we've managed to lower it to an instruction unsupported by sm_60. ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:1297-1304 if (EltVT == MVT::f16 && N->getValueType(0) == MVT::v2f16) { assert(NumElts % 2 == 0 && "Vector must have even number of elements"); EltVT = MVT::v2f16; NumElts /= 2; + } else if (EltVT == MVT::bf16 && N->getValueType(0) == MVT::v2bf16) { + assert(NumElts % 2 == 0 && "Vector must have even number of elements"); + EltVT = MVT::v2bf16; ---------------- These could be collapsed into ``` if ((EltVT == MVT::f16 && N->getValueType(0) == MVT::v2f16) || (EltVT == MVT::bf16 && N->getValueType(0) == MVT::v2bf16) ) { assert(NumElts % 2 == 0 && "Vector must have even number of elements"); EltVT = N->getValueType(0); NumElts /= 2; } ``` ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:147-153 + switch (VT.SimpleTy) { + default: + return false; + case MVT::v2f16: + case MVT::v2bf16: + return true; + } ---------------- It can be simplified to just `return (VT.SimpleTy == MVT::v2f16 || VT.SimpleTy == MVT::v2bf16);` ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:156 + +static bool Isf16Orbf16Type(MVT VT) { + switch (VT.SimpleTy) { ---------------- ditto. ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:623 + for (const auto &Op : {ISD::FADD, ISD::FMUL, ISD::FSUB, ISD::FMA}) { + setBF16OperationAction(Op, MVT::bf16, Legal, Promote); ---------------- Fold it into the loop above. ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:699 } + for (const auto &Op : {ISD::FMINNUM, ISD::FMAXNUM}) { + setBF16OperationAction(Op, MVT::bf16, GetMinMaxAction(Promote), Promote); ---------------- Fold into the loop processing `{ISD::FMINNUM, ISD::FMAXNUM}` above. Also, do we want/need to add bf16 handling for `{ISD::FMINIMUM, ISD::FMAXIMUM}` too? The LLVM's choice of constants `FMINIMUM` vs `FMINNUM` is rather unfortunate -- it's so easy to misread one for another. ================ Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:700-703 + setBF16OperationAction(Op, MVT::bf16, GetMinMaxAction(Promote), Promote); + setBF16OperationAction(Op, MVT::v2bf16, GetMinMaxAction(Expand), Expand); + setBF16OperationAction(Op, MVT::bf16, GetMinMaxAction(Expand), Expand); + setBF16OperationAction(Op, MVT::v2bf16, GetMinMaxAction(Expand), Expand); ---------------- I'm not sure what's going on here. Should it be Promote for bf16 and Expand for v2bf16? Why do we have two other entries, one of them trying to Expand bf16? ================ Comment at: llvm/lib/Target/NVPTX/NVPTXRegisterInfo.td:65-66 +def Float16x2Regs : NVPTXRegClass<[v2f16], 32, (add (sequence "HH%u", 0, 4))>; +def BFloat16Regs : NVPTXRegClass<[bf16], 16, (add (sequence "H%u", 0, 4))>; +def BFloat16x2Regs : NVPTXRegClass<[v2bf16], 32, (add (sequence "HH%u", 0, 4))>; def Float32Regs : NVPTXRegClass<[f32], 32, (add (sequence "F%u", 0, 4))>; ---------------- I suspect this may be a problem. What PTX do we end up generating if we have a function that needs to use both f16 and bf16 registers? I suspect we may end up with defining conflicting sets of registers. I still do not think that we need a spearate register class for bf16 and both bf16 and fp16 should be using a generic opaque 16/32 bit register types (or, even better, generic Int16/Int32 registers. RegClass accepts multiple type values, so it may be as simple as using `def Int16Regs : NVPTXRegClass<[i16,f16,bf16], 16, (add (sequence "RS%u", 0, 4))>;` and adjusting existing use cases. That should probably be done as a separate patch as there would be a lot of churn unrelated to bf16 support. I'll check if it's doable next week. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144911/new/ https://reviews.llvm.org/D144911 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits