On 01/07/2012 12:24 AM, Peter Bergner wrote:
Hi Vlad,

While debugging a slightly modified version of the test case in PR16458:

   int
   foo (unsigned int a, unsigned int b)
   {
     if (a == b) return 1;
     if (a>  b)  return 2;
     if (a<  b)  return 3;
     if (a != b) return 4;
     return 0;
   }

I noticed a couple of ugly code gen warts which I tracked back to IRA.
Namely, compiling the above with -O2 -m32 on powerpc64-linux, I'm seeing:

        li 9,3
        mr 3,9
        blr
and:
        li 9,1
        mr 3,9
        blr

If we look at the rtl just before IRA, we have the following:

BB2:
   (set (reg/v:SI 122 [ a ]) (reg:SI 3 3 [ a ]))                                
        REG_DEAD (reg:SI 3 3 [ a ])
   (set (reg/v:SI 123 [ b ]) (reg:SI 4 4 [ b ]))                                
        REG_DEAD (reg:SI 4 4 [ b ])
   (set (reg:CC 124) (compare:CC (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b ])))
   (if_then_else (eq (reg:CC 124) (const_int 0 [0]))
     goto BB6;

BB3:
   (set (reg:CCUNS 125) (compare:CCUNS (reg/v:SI 122 [ a ]) (reg/v:SI 123 [ b 
]))) REG_DEAD (reg/v:SI 123 [ b ])
                                                                                
REG_DEAD (reg/v:SI 122 [ a ])
   (set (reg:SI 120 [ D.1379 ]) (const_int 2 [0x2]))
   (if_then_else (gtu (reg:CC 124) (const_int 0 [0]))
     goto BB8;

BB4:
   (if_then_else (geu (reg:CC 124) (const_int 0 [0]))
     goto BB7;

BB5:
   (set (reg:SI 120 [ D.1379 ]) (const_int 3 [0x3]))
   goto BB8;

BB6:
   (set (reg:SI 120 [ D.1379 ]) (const_int 1 [0x1]))
   goto BB8;

BB7:
   (set (reg:SI 120 [ D.1379 ]) (const_int 4 [0x4]))

BB8:
   (set (reg/i:SI 3 3) (reg:SI 120 [ D.1379 ])) REG_DEAD (reg:SI 120 [ D.1379 ])
   (use (reg/i:SI 3 3))
   return

When we start coloring the allocnos, we get the following:

Pass 1 for finding pseudo/allocno costs

     r125: preferred CR_REGS, ...
     r124: preferred CR_REGS, ...
     r123: preferred GENERAL_REGS, ...
     r122: preferred GENERAL_REGS, ...
     r120: preferred GENERAL_REGS, ...

...

       Popping a3(r122,l0)  -- assign reg 3
       Popping a2(r123,l0)  -- assign reg 4
       Popping a0(r120,l0)  -- assign reg 9
       Popping a4(r124,l0)  -- assign reg 75
       Popping a1(r125,l0)  -- assign reg 3
Assigning 75 to a1r125

This looks a little startling, since we're initially assigning r125 to r3,
even though it's preferred class is CR_REGS before improve_allocation()
saves us and reassigns r125 to r75 (a real CR reg).  The reason r125
ends up initially in r3 is that we detect a "shuffle" copy during the
set of r125, because r122 (and r123) dies in the insn r125 is defined in.
This ends up preferencing the costs for r125, such that it wants r3.
This in turn via ALLOCNO_UPDATED_HARD_REG_COSTS() increases the cost
of assigning r120 to r3, such that r120 ends up with r9 instead, when
we really really want it to get r3.

Thanks for the analysis, Peter.
Your comments about the "shuffle" copies seem to infer that they're being
used to try and help insns with two operand contraints, but in the case
above, they're over preferencing things.  As an experiment, I disabled all
shuffle copies and the code gen for the test case above is much improved.

Do we really need or want to create shuffle copies for insns that do not
have a two operand constraint?
Yes, I think so. As I remember I did some benchmarking and it gave some "order" in hard register assignments and improved code slightly (at least for SPEC2000) even for 3-ops insn architectures.
   If not, do you know how we can test for that?
If you think we do need that for non two operand contraint insns, can we
at least disable creating shuffle copies for allocnos that have different
preferred classes, since they're probably not going to be assigned the
same hard reg?
I guess we could try this and it might work.
   Ala:

Index: ira-conflicts.c
===================================================================
--- ira-conflicts.c     (revision 182936)
+++ ira-conflicts.c     (working copy)
@@ -397,6 +397,11 @@ process_regs_for_copy (rtx reg1, rtx reg
    enum machine_mode mode;
    ira_copy_t cp;

+  if (!constraint_p
+&&  reg_preferred_class (REGNO (reg1))
+        != reg_preferred_class (REGNO (reg2)))
+    return false;
+
    gcc_assert (REG_SUBREG_P (reg1)&&  REG_SUBREG_P (reg2));
    only_regs_p = REG_P (reg1)&&  REG_P (reg2);
    reg1 = go_through_subreg (reg1,&offset1);


Your thoughts?


Your patch might work. But we need to test it for major 2-ops architecture x86/x86-64 and 3-ops ppc (I believe SPEC2000 would be ok for this).


Reply via email to