Andrew Stubbs <a...@codesourcery.com> writes: > On 07/07/2020 12:03, Richard Sandiford wrote: >> Andrew Stubbs <a...@codesourcery.com> writes: >>> This patch implements a floating-point fold_left_plus vector pattern, >>> which gives a significant speed-up in the BabelStream "dot" benchmark. >>> >>> The GCN architecture can't actually do an in-order vector reduction any >>> more efficiently than that equivalent scalar algorithm, so this is a bit >>> of a cheat. However, dividing the problem into threads using OpenACC or >>> OpenMP has already broken the in-order semantics, so we may as well >>> optimize the operation at the vector level too. >>> >>> If the user has specifically sorted the input data in order to get a >>> more correct FP result then using multiple threads is already the wrong >>> thing to do. But, if the input data is in no particular numerical order >>> then this optimization will give a correct answer much faster, albeit >>> possibly a slightly different one each run. >> >> There doesn't seem to be anything GCN-specific here though. >> If pragmas say that we can ignore associativity rules, we should apply >> that in target-independent code rather than in each individual target. > > Yes, I'm lazy. That, and I'm not sure what a target independent solution > would look like. > > Presumably we'd need something for both OpenMP and OpenACC, and it would > need to be specific to certain operations (not just blanket > -fassociative-math), which means the vectorizer (anywhere else?) would > need to be taught about the new thing? > > The nearest example I can think of is the force_vectorize flag that > OpenMP "simd" and OpenACC "vector" already use (the latter being > amdgcn-only as nvptx does its own OpenACC vectorization).
Yeah, I guess we'd need a way of querying whether a given reduction is by nature reassociative due to pragmas. It would probably need to name the specific reductions somehow. Agree it doesn't sound easy. > I'm also not completely convinced that this -- or other cases like it -- > isn't simply a target-specific issue. Could it be harmful on other > architectures? I'd hope not. No target should prefer in-order reductions over any-order reductions, since in-order implements any-order. Thanks, Richard