[PATCH] D96231: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Sanjay Patel via Phabricator via cfe-commits Mon, 08 Feb 2021 08:29:06 -0800

spatel added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829
         CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
+    Builder.getFastMathFlags().setAllowReassoc(true);
     return Builder.CreateCall(F, {Ops[0], Ops[1]});
----------------
I haven't looked at this part of the compiler in a long time, so I was 
wondering how we handle FMF scope. It looks like there is already an FMFGuard 
object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will 
not affect anything but this CreateCall(). 

Does that match your understanding? Should we have an extra regression test to 
make sure that does not change?

I am imagining something like:

```
double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) {
  double S = _mm512_reduce_add_pd(__W) + ExtraAddOp;
  return S;
}

```

Then we could confirm that `reassoc` is not applied to the `fadd` that follows 
the reduction call.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829
         CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
+    Builder.getFastMathFlags().setAllowReassoc(true);
     return Builder.CreateCall(F, {Ops[0], Ops[1]});
----------------
spatel wrote:
> I haven't looked at this part of the compiler in a long time, so I was 
> wondering how we handle FMF scope. It looks like there is already an FMFGuard 
> object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here 
> will not affect anything but this CreateCall(). 
> 
> Does that match your understanding? Should we have an extra regression test 
> to make sure that does not change?
> 
> I am imagining something like:
> 
> ```
> double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) {
>   double S = _mm512_reduce_add_pd(__W) + ExtraAddOp;
>   return S;
> }
> 
> ```
> 
> Then we could confirm that `reassoc` is not applied to the `fadd` that 
> follows the reduction call.
Currently (and we could say that this is an LLVM codegen bug), we will not 
generate the optimal/expected reduction with `reassoc` alone.

I think the x86 reduction definition is implicitly assuming that -0.0 is not 
meaningful here, so we should add `nsz` too.

The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm 
currently with only `reassoc`:
        
```
        vextractf64x4   $1, %zmm0, %ymm1
        vaddpd  %zmm1, %zmm0, %zmm0
        vextractf128    $1, %ymm0, %xmm1
        vaddpd  %xmm1, %xmm0, %xmm0
        vpermilpd       $1, %xmm0, %xmm1      
        vaddsd  %xmm1, %xmm0, %xmm0 
        vxorpd  %xmm1, %xmm1, %xmm1   <--- create 0.0
        vaddsd  %xmm1, %xmm0, %xmm0   <--- add it to the reduction result
```

Alternatively (and I'm not sure where it is specified), we could replace the 
default 0.0 argument with -0.0?


================
Comment at: clang/lib/Headers/avx512fintrin.h:9300
  * outputs. This class of vector operation forms the basis of many scientific
  * computations. In vector-reduction arithmetic, the evaluation off is
  * independent of the order of the input elements of V.
----------------
This is an existing text bug, but if we are changing this text, we might as 
well fix it in this patch - I'm not sure what "off" refers to here. Should that 
be "order"?


================
Comment at: clang/lib/Headers/avx512fintrin.h:9303
 
+ * For floating points type, we always assume the elements are reassociable 
even
+ * if -fast-math is off.
----------------
Typo: "floating-point types"


================
Comment at: clang/lib/Headers/avx512fintrin.h:9304
+ * For floating points type, we always assume the elements are reassociable 
even
+ * if -fast-math is off.
+
----------------
Also mention that sign of zero is indeterminate. We might use the LangRef text 
as a model for what to say here:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96231/new/

https://reviews.llvm.org/D96231

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96231: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Reply via email to