https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
--- Comment #23 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #22) > Going back to variants of the original testcase: > int > foo (int x, int y, int a) > { > int i = x; > int j = y; > #ifdef EX1 > if (__builtin_expect (x > y, 1)) > #elif defined EX0 > if (__builtin_expect (x > y, 0)) > #else > if (x > y) > #endif > { > i = a; > j = i; > } > return i * j; > } > > at least for the -DEX1 case I'd think it is reasonable to assign one of i or > j to the register holding argument a (%r4), because that will for the common > case need one fewer register move. But, there is one further constraint. > The multiplication wants the result to live in %r2, because then it can > avoid a move, and the multiplication is commutative two operand one, so one > of the operands needs to match the output. Thus, from this the reasonable > disposition choices are either i in %r2 and j in %r3 (this is especially > desirable if the if is unlikely, i.e. -DEX0 case), or perhaps i in %r2 and j > in %r4 (while this will need if/then/else rather than if/then only, it would > use one fewer move in the expected likely block). > The problem is that IRA chooses i in %r4 and j in %r3, so there are 2 moves > even in the fast path (the i = a assignment is a nop, but j = i is needed, > and then later on extra reload move is added, because pseudo 67 disposition > (result of mulsi3) is properly in %r2 and if one argument is %r4 and another > one %r3, it needs to copy one to %r2. > > Vlad, can you please have a look? I investigated it. I don't think that the desired results can be achieved without changing existing heuristics. Changing the existing heuristics is a big job. Besides rewriting code, it requires a lot of benchmarking. Also part of problem can be solved if we use conflicts built using value numbering instead of live-range analysis. I don't know a RA whose conflict graph is based on value numbering. What I don't like is that p67 originally got r4 and then it was changed to r2 in improve_allocation. Fixing that will improve the code but not for all possible combinations (EX1, EX0 etc). In any case I'll continue my work on the PR.