stores.

Andrew Stubbs Mon, 17 Sep 2018 02:40:30 -0700

On 17/09/18 10:14, Richard Sandiford wrote:

<a...@codesourcery.com> writes:

If the autovectorizer tries to load a GCN 64-lane vector elementwise then it
blows away the register file and produces horrible code.


Do all the registers really need to be live at once, or is it "just" bad
scheduling?  I'd have expected the initial rtl to load each element and
then insert it immediately, so that the number of insertions doesn't
directly affect register pressure.

They don't need to be live at once, architecturally speaking, but that'sthe way it happened. No doubt there is another solution to fix it, butit's not a use case I believe we want to spend time optimizing.

Actually, I've not tested what happens without this in GCC 9, so that'sprobably worth checking, but I'd still be concerned about it blowing upon real code somewhere.

This patch simply disallows elementwise loads for such large vectors.  Is there
a better way to disable this in the middle-end?


Do you ever want elementwise accesses for GCN?  If not, it might be
better to disable them in the target's cost model.

The hardware is perfectly capable of extracting or setting vectorelements, but given that it can do full gather/scatter from arbitraryaddresses it's not something we want to do in general.

A normal scalar load will use a vector register (lane 0). The value thenhas to be moved to a scalar register, and only then can v_writelaneinsert it into the final destination.

Alternatively you could use a mask_load to load the value directly tothe correct lane, but I don't believe that's something GCC does.


Andrew

Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

Reply via email to