Il-Capitano wrote: Just to clarify: `@llvm.vector.reduce.fadd` does sequential reduction by default, so I don't see a point in doing that manually in the frontend. Your previous implementation had the same behaviour.
The inconsistency with the sequential approach is that Clang defines the `__builtin_reduce_*` operations to do recursive even-odd pairwise reduction (i.e. `(v[0] + v[1]) + (v[2] + v[3])` instead of `((v[0] + v[1]) + v[2]) + v[3])`), and since floating-poing addition is not associative, these two can have different results. My suggestion was to not use the existing `__builtin_reduce_add` and `mul` builtins, but to define a new one that is defined to do sequential reduction, matching the behaviour of `@llvm.vector.reduce.fadd`, and one that is unordered, i.e. `@llvm.vector.reduce.fadd` with the `reassoc` fast-math flag set (in practice this will do the even-odd pairwise reduction, but there is a difference in generated code quality between doing that in the frontend, and the backend: https://godbolt.org/z/a4rd44Eza). You can see the generated code difference of `@llvm.vector.reduce.fadd` with and without the `reassoc` flag here: https://godbolt.org/z/zeWjxrxo5. https://github.com/llvm/llvm-project/pull/120367 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits