https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
--- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Going back to variants of the original testcase: int foo (int x, int y, int a) { int i = x; int j = y; #ifdef EX1 if (__builtin_expect (x > y, 1)) #elif defined EX0 if (__builtin_expect (x > y, 0)) #else if (x > y) #endif { i = a; j = i; } return i * j; } at least for the -DEX1 case I'd think it is reasonable to assign one of i or j to the register holding argument a (%r4), because that will for the common case need one fewer register move. But, there is one further constraint. The multiplication wants the result to live in %r2, because then it can avoid a move, and the multiplication is commutative two operand one, so one of the operands needs to match the output. Thus, from this the reasonable disposition choices are either i in %r2 and j in %r3 (this is especially desirable if the if is unlikely, i.e. -DEX0 case), or perhaps i in %r2 and j in %r4 (while this will need if/then/else rather than if/then only, it would use one fewer move in the expected likely block). The problem is that IRA chooses i in %r4 and j in %r3, so there are 2 moves even in the fast path (the i = a assignment is a nop, but j = i is needed, and then later on extra reload move is added, because pseudo 67 disposition (result of mulsi3) is properly in %r2 and if one argument is %r4 and another one %r3, it needs to copy one to %r2. Vlad, can you please have a look?