http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50984

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-11-03
          Component|target                      |rtl-optimization
     Ever Confirmed|0                           |1

--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2011-11-03 20:01:14 
UTC ---
(In reply to comment #1)
> IIRC this is a target issue.

Partially true....

Current tree generates:

    xorl    %eax, %eax    # 50    *movdi_xor    [length = 2]
    andl    $8, %edi    # 55    *andsi_2/1    [length = 3]
    je    .L2    # 10    *jcc_1    [length = 2]
    xorl    %eax, %eax    # 46    *movsi_xor    [length = 2]
    andl    $4, %esi    # 54    *andsi_2/1    [length = 3]
    setne    %al    # 48    *setcc_qi_slp    [length = 3]
.L2:
    rep    # 56    simple_return_internal_long    [length = 2]
    ret

The first XOR is in fact load of zero in DImode, the second XOR is load of zero
in SImode. This all happens in peephole2 pass, converting

    5 ax:SI=0
      REG_EQUAL: 0
  ...
   40 ax:QI=flags:CCZ!=0
      REG_DEAD: flags:CCZ
   41 ax:SI=zero_extend(ax:QI)

to

   50 {ax:DI=0;clobber flags:CC;}

  ...
   46 {ax:SI=0;clobber flags:CC;}
   49 {flags:CCZ=cmp(si:QI&0x4,0);si:QI=si:QI&0x4;}
   48 strict_low_part=flags:CCZ!=0

We can perform both clears in SImode, as with following patch:

--cut here--
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md    (revision 180840)
+++ config/i386/i386.md    (working copy)
@@ -17331,7 +17331,7 @@
    && peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 0) (const_int 0))
           (clobber (reg:CC FLAGS_REG))])]
-  "operands[0] = gen_lowpart (word_mode, operands[0]);")
+  "operands[0] = gen_lowpart (SImode, operands[0]);")

 (define_peephole2
   [(set (strict_low_part (match_operand 0 "register_operand" ""))
--cut here--

This results in the same assembly, but _.202r.peephole2 dump is now:

   50 {ax:SI=0;clobber flags:CC;}
  ...
   46 {ax:SI=0;clobber flags:CC;}
   49 {flags:CCZ=cmp(si:QI&0x4,0);si:QI=si:QI&0x4;}
   48 strict_low_part=flags:CCZ!=0

I'd expect that CE3 pass that follows peephole2 pass will eliminate (insn 46),
but for some reason this doesn't happen.

Confirmed as rtl-optimization issue.

Reply via email to