https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

--- Comment #23 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #22)
> Going back to variants of the original testcase:
> int
> foo (int x, int y, int a)
> {
>   int i = x;
>   int j = y;
> #ifdef EX1
>   if (__builtin_expect (x > y, 1))
> #elif defined EX0
>   if (__builtin_expect (x > y, 0))
> #else
>   if (x > y)
> #endif
>     {
>       i = a;
>       j = i;
>     }
>   return i * j;
> }
> 
> at least for the -DEX1 case I'd think it is reasonable to assign one of i or
> j to the register holding argument a (%r4), because that will for the common
> case need one fewer register move.  But, there is one further constraint. 
> The multiplication wants the result to live in %r2, because then it can
> avoid a move, and the multiplication is commutative two operand one, so one
> of the operands needs to match the output.  Thus, from this the reasonable
> disposition choices are either i in %r2 and j in %r3 (this is especially
> desirable if the if is unlikely, i.e. -DEX0 case), or perhaps i in %r2 and j
> in %r4 (while this will need if/then/else rather than if/then only, it would
> use one fewer move in the expected likely block).
> The problem is that IRA chooses i in %r4 and j in %r3, so there are 2 moves
> even in the fast path (the i = a assignment is a nop, but j = i is needed,
> and then later on extra reload move is added, because pseudo 67 disposition
> (result of mulsi3) is properly in %r2 and if one argument is %r4 and another
> one %r3, it needs to copy one to %r2.
> 
> Vlad, can you please have a look?

I investigated it.  I don't think that the desired results can be achieved
without changing existing heuristics.  Changing the existing heuristics is a
big job.  Besides rewriting code, it requires a lot of benchmarking.

Also part of problem can be solved if we use conflicts built using value
numbering instead of live-range analysis.  I don't know a RA whose conflict
graph is based on value numbering.

What I don't like is that p67 originally got r4 and then it was changed to r2
in improve_allocation.  Fixing that will improve the code but not for all
possible combinations (EX1, EX0 etc).  In any case I'll continue my work on the
PR.

Reply via email to