Hi! On Mon, Jul 13, 2020 at 07:25:37AM +0200, Hans-Peter Nilsson wrote: > > > > > TL;DR: fixing a misdetection of what is a "simple move". > > > > > > > > That is not a very correct characterisation of what this does :-) > > > > > > That's apparently where we completely disagree. :-) > > > > Well, I wrote that code, I know what is considered "just a move" there. > > You lost some context: I'm comparing before/after the > cc0-conversion for CRIS, where this is a misdetection (a false > negative) of a move and causes a performance-regression.
The cc0 conversion caused a performance regression. You can improve some code in combine to make that not happen. > > > I certainly don't contest that the move can be eliminated, and > > > that most cost-effective 2-2 eliminations are helpful. (See my > > > other post about combine being eager with allowing same-cost > > > combinations.) > > > > I did not see that post, do you have a pointer? > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549416.html I'll reply to that separately. > > single_set also allows other insns, for example, multiple sets! > > ...where the other sets are unused. Not sure how that kind of > thing would get combined here, but if its combined cost is > beneficial, it would be a win, I guess. The unused outputs are thrown away by combine, except when that doesn't match, then it is tried again but with the original clobbers. > > > Do you have some pointers to PR:s or something else backing that > > > statement, or is it your work-in-progress hinted below? > > > > I do not know what "work in progress" you mean? > > I'm referring to your "I'll rerun some testing to show this. > It'll take a while." Are the regressions you refer to above > tracked in bugzilla or on some mailing list? The whole 2-2 combine thing took half a year at least to develop. I have posted to gcc-patches@ about it a few times. Not having the is_just_move stuff causes highly visible regressions on some targets, I do not remember which, but it hurts all targets. 2-2 combine with the is_just_move (and make_extra_copies) stuff was a win everywhere. The tests just take long to run (it used to be 2h per run, but it is closer to 3h per run with the current compiler: building cross-compilers used to be less than 4m per target, this is much worse since a few years). Analysing the results is easy for most kinds of instruction combination: just looking at the binary size of the testcase (I usually build Linux) gives a good idea how effective combine was. But for 2-2 combinations that doesn't show much at all, so I dig through the actual resulting code (for many targets, not all 30, just those I think are interesting). > > For 2-2, size does *not* usually change, which brings us immediately > > into "a lot more work" territory. Oh, and all x86 compilers ICE. The x86-64 kernel *does* build, just some boot wrapper code fails, but the kernel itself does build. i386 does ICE however, something with memcpy or some such. > If combine only did lower-cost combinations (perhaps with > Richard Sandifords lower-size-when-tied suggestion), I guess > this wouldn't happen. 0:-) And we would regress (a LOT). > > It shows we can change to use single_set here. > > Did you mean "will show whether" or is it already complete? It did complete, yes (and didn't change a single resulting intruction). So that was easy :-) > > I'll review the original patch again, to point out where it still needs > > changing. > > ...but if you're in progress with a single_set variant, I'm all > for it. Yup, it's pretty simple actually :-) Thanks, Segher