Il-Capitano wrote:

`__builtin_reduce_add/mul` is defined to do _recursive even-odd pairwise 
reduction_ in Clang, and the `@llvm.vector.reduce.fadd` intrinsic doesn't do 
that.

I'm not sure if there's any benefit to this definition over a sequential 
reduction. The only thing I can think that could be affected is signed-integer 
overflow being UB, but that doesn't seem to be optimized for, or checked with 
UBSan: https://godbolt.org/z/TEsc86rf5.

Maybe, as RKSimon suggested, having a separate builtin for floating-point types 
could be better? Something like `__builtin_reduce_fadd/fmul` that does 
sequential reduction, and another one for unordered reduction?  
This builtin could also take a start value as the first argument to better 
match the LLVM intrinsic.

> Or maybe an alternative `__builtin_reduce_fastmath_fadd / fmul` that makes it 
> clear whats happening, and have them always emit the reassoc variants of the 
> reductions?

I think `__builtin_reduce_fadd_unordered` or something similar would be a 
better choice, since "fastmath" implies more restrictions/assumptions than 
`reassoc` (no nans, no signed zeros, etc.).

https://github.com/llvm/llvm-project/pull/120367
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to