Il-Capitano wrote:

Just to clarify: `@llvm.vector.reduce.fadd` does sequential reduction by 
default, so I don't see a point in doing that manually in the frontend. Your 
previous implementation had the same behaviour.

The inconsistency with the sequential approach is that Clang defines the 
`__builtin_reduce_*` operations to do recursive even-odd pairwise reduction 
(i.e. `(v[0] + v[1]) + (v[2] + v[3])` instead of `((v[0] + v[1]) + v[2]) + 
v[3])`), and since floating-poing addition is not associative, these two can 
have different results.

My suggestion was to not use the existing `__builtin_reduce_add` and `mul` 
builtins, but to define a new one that is defined to do sequential reduction, 
matching the behaviour of `@llvm.vector.reduce.fadd`, and one that is 
unordered, i.e. `@llvm.vector.reduce.fadd` with the `reassoc` fast-math flag 
set (in practice this will do the even-odd pairwise reduction, but there is a 
difference in generated code quality between doing that in the frontend, and 
the backend: https://godbolt.org/z/a4rd44Eza).

You can see the generated code difference of `@llvm.vector.reduce.fadd` with 
and without the `reassoc` flag here: https://godbolt.org/z/zeWjxrxo5.

https://github.com/llvm/llvm-project/pull/120367
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to