------- Comment #9 from wilson at gcc dot gnu dot org 2005-10-14 17:44 ------- The cause of this problem is the following two lines in the i386.c file const int x86_himode_math = ~(m_PPRO); const int x86_promote_hi_regs = m_PPRO;
They were added here: http://gcc.gnu.org/ml/gcc-patches/2000-02/msg00890.html The reason for this is, as a previous comment mentioned, that HImode instructions are slow on the pentiumpro and should be avoided. Doing this gives better performance in general, but unfortunately, for this particular testcase, it causes us to miss an optimization. The issue in this case is a combiner limit. If you compile for Pentium, you get (set (reg:HI 61) (and:HI (mem/c/i:HI (symbol_ref:SI ("y"))) (const_int -256))) (set (reg:HI 63) (ior:HI (reg:HI 61) (reg:HI 62))) (set (mem/c/i:HI (symbol_ref:SI ("y")) (reg:HI 63)) The combiner combines these 3 instructions to get (set (mem/c/i:QI (const:SI (plus:SI (symbol_ref:SI ("x")) (const_int 1 [0x1])))) (subreg:QI (reg:SI 59 [ c ]))) However, for pentium pro, we end up with 4 instructions due to the HImode promotion. (set (reg:HI 61 [ y ]) (mem/c/i:HI (symbol_ref:SI ("y")))) (set (reg:SI 62) (and:SI (subreg:SI (reg:HI 61 [ y ]) 0) (const_int -256 [0xffffffffffffff00]))) (set (reg:SI 65) (ior:SI (reg:SI 62) (subreg:SI (reg:HI 63 [ c ]) 0))) (set (mem/c/i:HI (symbol_ref:SI ("y")) (subreg:HI (reg:SI 65) 0)) The combiner combines at most 3 instructions, to avoid combinatorial explosion, and hence we are not able to optimize this. I'll look at this a bit more, but at the moment, I'm skeptical that there is any easy solution. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15184