http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 10:13:41 UTC --- The tree level looks correct: <bb 4>: # ivtmp.23_69 = PHI <ivtmp.23_68(5), ivtmp.23_66(3)> D.1771_65 = (void *) ivtmp.23_69; D.1717_6 = MEM[base: D.1771_65, offset: 0B]; D.1722_10 = MEM[base: D.1771_65, offset: 4B]; D.1723_11 = D.1722_10 | D.1717_6; D.1727_15 = MEM[base: D.1771_65, offset: 8B]; D.1731_19 = MEM[base: D.1771_65, offset: 12B]; D.1732_20 = D.1731_19 | D.1727_15; D.1733_21 = D.1732_20 | D.1723_11; ivtmp.23_68 = ivtmp.23_69 + 16; if (D.1733_21 != 0) goto <bb 6>; else goto <bb 5>; <bb 5>: goto <bb 4>; <bb 6>: # D.1723_22 = PHI <D.1723_11(4), D.1723_36(2)> # D.1732_23 = PHI <D.1732_20(4), D.1732_45(2)> D.1737_25 = D.1723_22 * D.1732_23; return D.1737_25; --- CUT ---- Are you trying to say GCC should copy the loop header in this case? Or keeping around x[i]|x[i+1] and x[i+2]|x[i+3] result increases register pressure and exposes issues with 2-operand machines in some cases?