Andrew Stubbs <a...@codesourcery.com> writes:
> On 07/07/2020 12:03, Richard Sandiford wrote:
>> Andrew Stubbs <a...@codesourcery.com> writes:
>>> This patch implements a floating-point fold_left_plus vector pattern,
>>> which gives a significant speed-up in the BabelStream "dot" benchmark.
>>>
>>> The GCN architecture can't actually do an in-order vector reduction any
>>> more efficiently than that equivalent scalar algorithm, so this is a bit
>>> of a cheat.  However, dividing the problem into threads using OpenACC or
>>> OpenMP has already broken the in-order semantics, so we may as well
>>> optimize the operation at the vector level too.
>>>
>>> If the user has specifically sorted the input data in order to get a
>>> more correct FP result then using multiple threads is already the wrong
>>> thing to do. But, if the input data is in no particular numerical order
>>> then this optimization will give a correct answer much faster, albeit
>>> possibly a slightly different one each run.
>> 
>> There doesn't seem to be anything GCN-specific here though.
>> If pragmas say that we can ignore associativity rules, we should apply
>> that in target-independent code rather than in each individual target.
>
> Yes, I'm lazy. That, and I'm not sure what a target independent solution 
> would look like.
>
> Presumably we'd need something for both OpenMP and OpenACC, and it would 
> need to be specific to certain operations (not just blanket 
> -fassociative-math), which means the vectorizer (anywhere else?) would 
> need to be taught about the new thing?
>
> The nearest example I can think of is the force_vectorize flag that 
> OpenMP "simd" and OpenACC "vector" already use (the latter being 
> amdgcn-only as nvptx does its own OpenACC vectorization).

Yeah, I guess we'd need a way of querying whether a given reduction
is by nature reassociative due to pragmas.  It would probably need
to name the specific reductions somehow.

Agree it doesn't sound easy.

> I'm also not completely convinced that this -- or other cases like it -- 
> isn't simply a target-specific issue. Could it be harmful on other 
> architectures?

I'd hope not.  No target should prefer in-order reductions over
any-order reductions, since in-order implements any-order.

Thanks,
Richard

Reply via email to