Hi all,
I'm trying to solve an infinite loop in the "reload" pass (LRA). I need
early-clobber on my load instructions and it goes wrong when register
pressure is high.
Is there a proper way to fix this? Or do I need to do something "hacky"
like fixing a register for use with reloads?
Here's the background .....
AMD GCN has a thing called XNACK mode in which load instructions can be
interrupted (by a page miss, for example) and therefore need to be
written such that they are "restartable". This basically means that the
output must not overwrite the input registers (it can happen that a load
is partially successful, especially for vectors, but I believe
overwriting the address and offsets is never safe, even for scalars). Up
to now we've not needed this mode, but it will be needed for Unified
Shared Memory (and theoretically for APU devices).
So I have added new alternatives into my machine description that use
early-clobber set:
[v ,RF ;flat ,* ,12,* ,off] flat_load%o1\t%0, %A1%O1%g1
[&v ,RF ;flat ,* ,12,* ,on ] ^
(The "on" and "off" represent the XNACK mode.)
LRA then generates a register "Assignment" section in the dump, but it's
not happy for some reason and generates another, and another, each with
more and more pseudo registers and insns, and it goes on forever until
the dump file is gigabytes and I kill it.
This is a vague description, sorry, because I don't really understand
what's going on here and the dump files are huge with tens of thousands
of pseudo registers to wade through. I'm hoping somebody recognises the
issue without me spending days on it.
I have a workaround because there's no known failure on devices that
have the AVGPR register file (they use it as spill space and therefore
don't need the memory loads) and I actually don't need XNACK on the
older devices at this time, but probably this is just pushing the
problem further down the road so if there's a better solution then I'd
like to find it.
Thanks in advance
Andrew