On Sep 3, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote: >> On Sep 3, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote: >> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote: >> >> On Sep 2, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote: >> >> >> we might succeed, but only if we had a pattern >> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter >> >> >> as >> >> >> the second element of the parallel, because that's where combine adds >> >> >> it >> >> >> to the new i3 pattern, after splitting it out of i2. >> >> >> >> > That sounds like the backend pattern has it wrong then? There is a >> >> > canonical order for this? >> >> >> >> Much as I can tell, there isn't, it's just an arbitrary choice of >> >> backends, some do it one way or the other, and that causes combine to be >> >> able to perform some combinations but not others. >> >> > For instructions that inherently set a condition code register, the >> > @code{compare} operator is always written as the first RTL expression of >> > the @code{parallel} instruction pattern. >> >> Interesting. I'm pretty sure I read email recently that suggested it >> was really up to the port, but I've caught up with GCC emails from years >> ago, so that might have been it. Or I just misremember. Whatever. > I think you remember right. But combine depends on the documented > order Except when it doesn't ;-) Under: /* If the actions of the earlier insns must be kept in addition to substituting them into the latest one, we must make a new PARALLEL for the latest insn to hold additional the SETs. */ it turns the RMW add pattern into a PARALLEL and adds to it i0's flags-setter, that had been split out of i2. The PARALLEL has the flag-setter as the second in the parallel, as you can see below, and nothing (other than my patch) attempts to change that order. > What does that RTL look like exactly? Trying 5, 7 -> 15: 5: r91:SI=zero_extend([`i']) 7: {flags:CCC=cmp(r91:SI#0+0x1,r91:SI#0);r92:QI=r91:SI#0+0x1;} REG_DEAD r91:SI 15: [`i']=r92:QI REG_DEAD r92:QI Successfully matched this instruction: (parallel [ (set (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b4 0 i>) [0 i+0 S1 A8]) (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ff ff7ff7b40 i>) [0 i+0 S1 A8]) (const_int 1 [0x1]))) (set (reg:CCC 17 flags) (compare:CCC (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8]) (const_int 1 [0x1])) (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8]))) ]) The reason it successfully matched in this combine dump is that the above had the permuting recog for combine that I posted earlier enabled. Without that patch, it wouldn't be recognized, and ended up rejected. > Do you have a simple testcase? Or a -fdump-rtl-combine-all dump. Not so simple, I'm afraid. I mean, the testcase is simple, but reproducing this behavior isn't. The small testcase I'm using only gets to that point if: - we allow flags to pass through extended asms, as in the other patch - I arrange for combine to give the above set of insns a second shot, after the removal of the intervening flag-store insn, that prevents the combination in the first try typedef unsigned char T; T i; int inc () { T c = __builtin_add_overflow (i, 1, &i); asm ("" : "+m" (i) : : "none"); i += c; } -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer