On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote: > On Sep 3, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote: > >> On Sep 2, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote: > >> >> we might succeed, but only if we had a pattern > >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as > >> >> the second element of the parallel, because that's where combine adds it > >> >> to the new i3 pattern, after splitting it out of i2. > >> > >> > That sounds like the backend pattern has it wrong then? There is a > >> > canonical order for this? > >> > >> Much as I can tell, there isn't, it's just an arbitrary choice of > >> backends, some do it one way or the other, and that causes combine to be > >> able to perform some combinations but not others. > > > For instructions that inherently set a condition code register, the > > @code{compare} operator is always written as the first RTL expression of > > the @code{parallel} instruction pattern. > > Interesting. I'm pretty sure I read email recently that suggested it > was really up to the port, but I've caught up with GCC emails from years > ago, so that might have been it. Or I just misremember. Whatever.
I think you remember right. But combine depends on the documented order, and so does compare-elim (since 4f0473fe89e6), so now the documented order is always the only wanted one. > The x86 pattern that fails to match in combine has the flags setter > first, but combine places it second, after splitting it out of i2 and > then appending it back to i3. What does that RTL look like exactly? This canonical form is only for a set of the flags as a compare to 0 of what the other set sets (hrm, I hope you can make sense of that). > Alas, it would be just as legitimate for combine to go the opposite way, > substituting the flags set into another insn, and then tacking the other > set onto the substituted-into insn. Combine always generates the canonical form (for this, anyway; and it is a missed optimisation bug if it makes something non-canonical anywhere). Do you have a simple testcase? Or a -fdump-rtl-combine-all dump. Segher