------- Comment #1 from rask at gcc dot gnu dot org 2007-11-14 01:44 ------- With -S -dp it is clear that only byte0 is optimized:
byte0: movzbl 4(%esp), %eax # 11 *movqi_1/3 byte1: movl 4(%esp), %eax # 24 *movsi_1/1 movl 8(%esp), %edx # 25 *movsi_1/1 shrdl $8, %edx, %eax # 30 x86_shrd_1/1 byte6: movzwl 10(%esp), %eax # 24 *zero_extendhisi2_movzwl byte7: movzbl 11(%esp), %eax # 28 *zero_extendqisi2_movzbw They should all be optimized to use movqi. The first part of the problem is that any of cse, cse2, gcse and fwprop will combine these instructions (insn 7 6 8 2 /tmp/pr34072.c:3 (set (reg:QI 60) (subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil)) (insn 8 7 12 2 /tmp/pr34072.c:3 (set (reg:QI 58 [ <result> ]) (reg:QI 60)) 62 {*movqi_1} (nil)) (insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax) (reg:QI 58 [ <result> ])) 62 {*movqi_1} (nil)) into (insn 12 8 18 2 /tmp/pr34072.c:3 (set (reg/i:QI 0 ax [ <result> ]) (subreg:QI (reg:SI 64) 0)) 62 {*movqi_1} (nil)) and then combine won't touch it because of the hard register (ax) and SMALL_REGISTER_CLASSES and/or CLASS_LIKELY_SPILLED. The fix is to teach these passes to not combine these insns, as demonstrated using -fno-forward-propagate -fno-gcse -fno-rerun-cse-after-loop -fno-cse[1]: byte6: movzbl 10(%esp), %eax # 8 *movqi_1/3 byte7: movzbl 11(%esp), %eax # 8 *movqi_1/3 Byte1 is still not optimized because we're failing to simplify this instruction in combine: (set (reg:QI 60) (subreg:QI (lshiftrt:DI (mem/c/i:DI (reg/f:SI 16 argp) [2 x+0 S8 A32]) (const_int 8 [0x8])) 0)) I should be entirely possible to simplify it to this: (set (reg:QI 60) (mem/c/i:QI (plus:SI (reg/f:SI 16 argp) (const_int 1)))) [1] An option I hacked in to debug this problem. -- rask at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Component|target |rtl-optimization Ever Confirmed|0 |1 Keywords| |missed-optimization Known to fail| |4.3.0 Last reconfirmed|0000-00-00 00:00:00 |2007-11-14 01:44:03 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34072