spatel added inline comments.
================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829 CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType()); + Builder.getFastMathFlags().setAllowReassoc(true); return Builder.CreateCall(F, {Ops[0], Ops[1]}); ---------------- I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall(). Does that match your understanding? Should we have an extra regression test to make sure that does not change? I am imagining something like: ``` double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; return S; } ``` Then we could confirm that `reassoc` is not applied to the `fadd` that follows the reduction call. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829 CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType()); + Builder.getFastMathFlags().setAllowReassoc(true); return Builder.CreateCall(F, {Ops[0], Ops[1]}); ---------------- spatel wrote: > I haven't looked at this part of the compiler in a long time, so I was > wondering how we handle FMF scope. It looks like there is already an FMFGuard > object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here > will not affect anything but this CreateCall(). > > Does that match your understanding? Should we have an extra regression test > to make sure that does not change? > > I am imagining something like: > > ``` > double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { > double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; > return S; > } > > ``` > > Then we could confirm that `reassoc` is not applied to the `fadd` that > follows the reduction call. Currently (and we could say that this is an LLVM codegen bug), we will not generate the optimal/expected reduction with `reassoc` alone. I think the x86 reduction definition is implicitly assuming that -0.0 is not meaningful here, so we should add `nsz` too. The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm currently with only `reassoc`: ``` vextractf64x4 $1, %zmm0, %ymm1 vaddpd %zmm1, %zmm0, %zmm0 vextractf128 $1, %ymm0, %xmm1 vaddpd %xmm1, %xmm0, %xmm0 vpermilpd $1, %xmm0, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result ``` Alternatively (and I'm not sure where it is specified), we could replace the default 0.0 argument with -0.0? ================ Comment at: clang/lib/Headers/avx512fintrin.h:9300 * outputs. This class of vector operation forms the basis of many scientific * computations. In vector-reduction arithmetic, the evaluation off is * independent of the order of the input elements of V. ---------------- This is an existing text bug, but if we are changing this text, we might as well fix it in this patch - I'm not sure what "off" refers to here. Should that be "order"? ================ Comment at: clang/lib/Headers/avx512fintrin.h:9303 + * For floating points type, we always assume the elements are reassociable even + * if -fast-math is off. ---------------- Typo: "floating-point types" ================ Comment at: clang/lib/Headers/avx512fintrin.h:9304 + * For floating points type, we always assume the elements are reassociable even + * if -fast-math is off. + ---------------- Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for what to say here: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D96231/new/ https://reviews.llvm.org/D96231 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits