Il-Capitano wrote: `__builtin_reduce_add/mul` is defined to do _recursive even-odd pairwise reduction_ in Clang, and the `@llvm.vector.reduce.fadd` intrinsic doesn't do that.
I'm not sure if there's any benefit to this definition over a sequential reduction. The only thing I can think that could be affected is signed-integer overflow being UB, but that doesn't seem to be optimized for, or checked with UBSan: https://godbolt.org/z/TEsc86rf5. Maybe, as RKSimon suggested, having a separate builtin for floating-point types could be better? Something like `__builtin_reduce_fadd/fmul` that does sequential reduction, and another one for unordered reduction? This builtin could also take a start value as the first argument to better match the LLVM intrinsic. > Or maybe an alternative `__builtin_reduce_fastmath_fadd / fmul` that makes it > clear whats happening, and have them always emit the reassoc variants of the > reductions? I think `__builtin_reduce_fadd_unordered` or something similar would be a better choice, since "fastmath" implies more restrictions/assumptions than `reassoc` (no nans, no signed zeros, etc.). https://github.com/llvm/llvm-project/pull/120367 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits