mibintc created this revision. mibintc added reviewers: andrew.w.kaylor, pengfei, kbsmith1. Herald added subscribers: dexonsmith, jfb, hiraditya. mibintc requested review of this revision. Herald added a subscriber: jdoerfert. Herald added a project: LLVM.
This is a proposal to add a new llvm intrinsic, llvm.arith.fence. The purpose is to provide fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. We are also proposing a new clang builtin that provides access to this intrinsic, as well as a new clang command line option `-fprotect-parens` that will be implemented using this intrinsic. This patch is authored by @pengfei Rationale --------- Some expression transformations that are mathematically correct, such as reassociation and distribution, may be incorrect when dealing with finite precision floating point. For example, these two expressions, (a + b) + c a + (b + c) are equivalent mathematically in integer arithmetic, but not in floating point. In some floating point (FP) models, the compiler is allowed to make these value-unsafe transformations for performance reasons, even when the programmer uses parentheses explicitly. But the compiler must always honor the parentheses implied by llvm.arith.fence, regardless of the FP model settings. Under `–ffp-model=fast`, llvm.arith.fence provides a way to partially enforce ordering in an FP expression. | Original expression | Transformed expression | Permitted? | | ----------------------------- | ---------------------- | ---------- | | (a + b) + c | a + (b + c) | Yes! | | llvm.arith.fence((a + b) + c) | a + (b + c) | No! | | NOTE: The llvm.arith.fence serves no purpose in value-safe FP modes like `–ffp-model=precise`: FP expressions are already strictly ordered. The new llvm intrinsic also enables the implementation of the option `-fprotect-parens` which is available in gfortran as well as the Intel C++ and Fortran compilers: icc and ifort. Proposed llvm IR changes ------------------------ Requirements for llvm.arith.fence: - There is one operand. The input to the intrinsic is an llvm::Value and must be scalar floating point or vector floating point. - The return type is the same as the operand type. - The return value is equivalent to the operand. Optimizing llvm.arith.fence --------------------------- - Constant folding may substitute the constant value of the llvm.arith.fence operand for the value of fence itself in the case where the operand is constant. - CSE Detection: No special changes needed: if E1 and E2 are CSE, then llvm.arith.fence(E1) and llvm.arith.fence(E2) are CSE. - FMA transformation should be enabled, at least in the -ffp-model=fast case. - The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be: llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c)) - In the ffp-model=fast case, FMA formation doesn’t happen until Isel, so we just need to add the llvm.arith.fence cases to ISel pattern matching. - There are some choices around the FMA optimization. For this example: %t1 = fmul double %x, %y %t2 = call double @llvm.arith.fence.f64(double %t1) %t3 = fadd contract double %t2, %z 1. FMA is allowed across an arith.fence if and only if the FMF `contract` flag is set for the llvm.arith.fence operand. //We are recommending this choice.// 2. FMA is not allowed across a fence 3. The FMF `contract` flag should be set on the llvm.arith.fence intrinsic call if contraction should be enabled - Fast Math Optimization: - The result of a llvm.arith.fence can participate in fast math optimizations. For example: // This transformation is legal: w + llvm.arith.fence(x + y) + z → w + z + llvm.arith.fence(x + y) - The operand of a llvm.arith.fence can participate in fast math optimizations. For example: // This transformation is legal: llvm.arith.fence((x+y)+z) --> llvm.arith.fence(x+(y+z)) NOTE: We want fast-math optimization within the fence, but not across the fence. - MIR Optimization: - The use of a pseudo-operation in the MIR serves the same purpose as the intrinsic in the IR, since all the optimizations are based on patterns matching from known DAGs/MIs. - Backend simply respects the llvm.arith.fence intrinsic, builds llvm.arith.fence node during DAG/ISel and emits pseudo arithmetic_fence MI after it. - The pseudo arithmetic_fence MI turns into a comment when emitting assembly. Other llvm changes needed -- utility functions ---------------------------------------------- The ValueTracking utilities will need to be taught to handle the new intrinsic. For example, there are utility functions like `isKnownNeverNaN()` and `CannotBeOrderedLessThanZero()` that will need to “look through” the intrinsic. A simple example ---------------- // llvm IR, llvm.arith.fence over addition. %5 = load double, double* %B, align 8 %add1 = fadd double %4, %5 %6 = call double @llvm.arith.fence.f64(double %add1) %7 = load double, double* %C, align 8 %mul = fmul double %6, %7 store double %mul, double* %A, align 8 Example, llvm.arith.fence over memory operand --------------------------------------------- Consider this similar example, which illustrates how ‘x’ can be optimized while ‘z’ is fenced. Notice ‘q’ is simplified to ‘b’ (q = a + b - a -> q = b), but ‘z’ isn’t simplified because of the fence. // llvm IR define dso_local float @f(float %a, float %b) local_unnamed_addr #0 { %x = fadd fast float %b, %a %tmp = call fast float @llvm.arith.fence.f32(float %x) %z = fsub fast float %tmp, %a %result = call fast float @llvm.maxnum.f32(float %z, float %b) ret float %result Clang changes to take advantage of this intrinsic ------------------------------------------------- - Add new clang builtin __arithmetic_fence - Add builtin definition - There is one operand. Any kind of expression, including memory operand. - The return type is the same as the operand type. The result of the intrinsic is the value of its rvalue operand. - The operand type can be any scalar floating point type, complex, or vector with float or complex element type. - The invocation of __arithmetic_fence is not a C/C++ constant expression, even if the operands are constant. - Add semantic checks and test cases - Modify clang/codegen to generate the llvm.arith.fence intrinsic - Add support for a new command-line option `-fprotect-parens` which honors parentheses within a floating point expression, the default is `-fno-protect-parens`. For example, // Compile with -ffast-math double A,B,C; A = __arithmetic_fence(A+B)*C; // llvm IR %4 = load double, double* %A, align 8 %5 = load double, double* %B, align 8 %add1 = fadd double %4, %5 %6 = call double @llvm.arith_fence.f64(double %add1) %7 = load double, double* %C, align 8 %mul = fmul double %6, %7 store double %mul, double* %A, align 8 - Motivation: the new clang builtin provides clang compatibility with the Intel C++ compiler builtin `__fence` which has similar semantics, and likewise enables implementation of the option `-fprotect-parens`. The new builtin provides the clang programmer control over floating point optimizations at the expression level. Pros & Cons ----------- 1. Pros - Increases expressiveness and precise control over floating point calculations. - Provides a desirable compatibility feature from industrial compilers 1. Cons - Intrinsic bloat. - Some of LLVM's optimizations need to understand the llvm.arith.fence semantics in order to retain optimization capabilities. This will require at least some engineering effort. - Any target that wants to support this has to make modifications to their back-end. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D99675 Files: llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/CodeGen/BasicTTIImpl.h llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/IR/Intrinsics.td llvm/include/llvm/Support/TargetOpcodes.def llvm/include/llvm/Target/Target.td llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -7210,6 +7210,14 @@ } break; } + case Intrinsic::arithmetic_fence: { + SDValue Val = getValue(FPI.getArgOperand(0)); + SDValue N(DAG.getMachineNode(TargetOpcode::ARITH_FENCE, getCurSDLoc(), + Val.getValueType(), Val), + 0); + setValue(&FPI, N); + return; + } } // A few strict DAG nodes carry additional operands that are not Index: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp =================================================================== --- llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -1265,6 +1265,9 @@ case TargetOpcode::PSEUDO_PROBE: emitPseudoProbe(MI); break; + case TargetOpcode::ARITH_FENCE: + OutStreamer->emitRawComment("ARITH_FENCE"); + break; default: emitInstruction(&MI); if (CanDoExtraAnalysis) { Index: llvm/include/llvm/Target/Target.td =================================================================== --- llvm/include/llvm/Target/Target.td +++ llvm/include/llvm/Target/Target.td @@ -1172,6 +1172,12 @@ let AsmString = "PSEUDO_PROBE"; let hasSideEffects = 1; } +def ARITH_FENCE : StandardPseudoInstruction { + let OutOperandList = (outs unknown:$dst); + let InOperandList = (ins unknown:$src); + let AsmString = ""; + let hasSideEffects = false; +} def STACKMAP : StandardPseudoInstruction { let OutOperandList = (outs); Index: llvm/include/llvm/Support/TargetOpcodes.def =================================================================== --- llvm/include/llvm/Support/TargetOpcodes.def +++ llvm/include/llvm/Support/TargetOpcodes.def @@ -117,6 +117,9 @@ /// Pseudo probe HANDLE_TARGET_OPCODE(PSEUDO_PROBE) +/// Arithmetic fence. +HANDLE_TARGET_OPCODE(ARITH_FENCE) + /// A Stackmap instruction captures the location of live variables at its /// position in the instruction stream. It is followed by a shadow of bytes /// that must lie within the function and not contain another stackmap. Index: llvm/include/llvm/IR/Intrinsics.td =================================================================== --- llvm/include/llvm/IR/Intrinsics.td +++ llvm/include/llvm/IR/Intrinsics.td @@ -1311,6 +1311,9 @@ def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty, llvm_i64_ty], [IntrInaccessibleMemOnly, IntrWillReturn]>; +// Arithmetic fence intrinsic. +def int_arithmetic_fence : Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem]>; + // Intrinsics to support half precision floating point format let IntrProperties = [IntrNoMem, IntrWillReturn] in { def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>; Index: llvm/include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- llvm/include/llvm/CodeGen/ISDOpcodes.h +++ llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -1085,6 +1085,10 @@ /// specifier. PREFETCH, + /// ARITH_FENCE - This corresponds to a arithmetic fence intrinsic. Both its + /// operand and output are the same floating type. + ARITH_FENCE, + /// OUTCHAIN = ATOMIC_FENCE(INCHAIN, ordering, scope) /// This corresponds to the fence instruction. It takes an input chain, and /// two integer constants: an AtomicOrdering and a SynchronizationScope. Index: llvm/include/llvm/CodeGen/BasicTTIImpl.h =================================================================== --- llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -1515,6 +1515,7 @@ case Intrinsic::lifetime_end: case Intrinsic::sideeffect: case Intrinsic::pseudoprobe: + case Intrinsic::arithmetic_fence: return 0; case Intrinsic::masked_store: { Type *Ty = Tys[0]; Index: llvm/include/llvm/Analysis/TargetTransformInfoImpl.h =================================================================== --- llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -567,6 +567,7 @@ case Intrinsic::assume: case Intrinsic::sideeffect: case Intrinsic::pseudoprobe: + case Intrinsic::arithmetic_fence: case Intrinsic::dbg_declare: case Intrinsic::dbg_value: case Intrinsic::dbg_label:
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits