On 11/17/22 21:53, Palmer Dabbelt wrote:
On Thu, 17 Nov 2022 14:44:31 PST (-0800), jeffreya...@gmail.com wrote:
On 11/8/22 12:55, Philipp Tomsich wrote:
If we are testing a register or a paradoxical subreg (i.e. anything
that is not
a partial subreg) for equality/non-equality with zero, we can
generate a branch
that compares against $zero. This will work for QI, HI, SI and
DImode, so we
enable this for ANYI.
2020-08-30 gcc/ChangeLog:
* config/riscv/riscv.md (*branch<mode>_equals_zero): Added pattern.
I've gone back an forth on this a few times. As you know, I hate
subregs in the target descriptions and I guess I need to extend that to
querying if something is a subreg or not rather than just subregs
appearing in the RTL.
Presumably the idea behind rejecting partial subregs is the bits outside
the partial is unspecified, but that's also going to be true if we're
looking at a hardreg in QImode (for example) irrespective of it being
wrapped in a subreg.
I don't doubt it works the vast majority of the time, but I haven't been
able to convince myself it'll work all the time. How do we ensure that
the bits outside the mode are zero? I've been bitten by this kind of
problem before, and it's safe to say it was exceedingly painful to find.
I don't really understand the middle-end issues here (if there are
any?), but I'm pretty sure code like this has passed by a few times
before and we've yet to find a reliable way to optimize these cases.
There's a bunch of patterns where knowing the XLEN-extension of
shorter values would let us generate better code, but there's also
cases where we'd generate worse code by ensure any extension scheme is
followed.
It's not really the extension scheme, though that is a subset of the
concerns in this space. Essentially we have to be 100% sure that the
bits outside of the branch mode (QI/HI/SI) and XLEN are zero, it's not
just the sign bit. This becomes even more of a concern as we exploit
the bitmanip extensions more aggressively.
The SUBREG check is supposed to avoid that problem, but I'm not
convinced it's sufficient.
Philipp claims that PROMOTE_MODE plus WORD_REGISTER_OPERATIONS is
sufficient here, but I'm not sure that's the case. He's digging out
the rationale from some internal archives which we'll dig into once he
finds it.
I'd be happy to be proved wrong :-)
jeff