On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
> >> On Sep  2, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> >> >> we might succeed, but only if we had a pattern
> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
> >> >> the second element of the parallel, because that's where combine adds it
> >> >> to the new i3 pattern, after splitting it out of i2.
> >> 
> >> > That sounds like the backend pattern has it wrong then?  There is a
> >> > canonical order for this?
> >> 
> >> Much as I can tell, there isn't, it's just an arbitrary choice of
> >> backends, some do it one way or the other, and that causes combine to be
> >> able to perform some combinations but not others.
> 
> > For instructions that inherently set a condition code register, the
> > @code{compare} operator is always written as the first RTL expression of
> > the @code{parallel} instruction pattern.
> 
> Interesting.  I'm pretty sure I read email recently that suggested it
> was really up to the port, but I've caught up with GCC emails from years
> ago, so that might have been it.  Or I just misremember.  Whatever.

I think you remember right.  But combine depends on the documented
order, and so does compare-elim (since 4f0473fe89e6), so now the
documented order is always the only wanted one.

> The x86 pattern that fails to match in combine has the flags setter
> first, but combine places it second, after splitting it out of i2 and
> then appending it back to i3.

What does that RTL look like exactly?  This canonical form is only for
a set of the flags as a compare to 0 of what the other set sets (hrm, I
hope you can make sense of that).

> Alas, it would be just as legitimate for combine to go the opposite way,
> substituting the flags set into another insn, and then tacking the other
> set onto the substituted-into insn.

Combine always generates the canonical form (for this, anyway; and it is
a missed optimisation bug if it makes something non-canonical anywhere).

Do you have a simple testcase?  Or a -fdump-rtl-combine-all dump.


Segher

Reply via email to