http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55278
--- Comment #14 from Uros Bizjak <ubizjak at gmail dot com> --- (In reply to Jakub Jelinek from comment #12) > (force gcc to avoid xorw memory, %hireg and instead use movzwl memory, > %sireg; ... xorl %sireg, %sireg2) and p2 was something similar for *xorqi_1. > > Looking at icc generated assembly, it is interesting to see that the only > HImode instructions it ever uses are rolw and movw stores, for everything > else it uses > movzwl loads and SImode arithmetics (well, I guess shift right > shrw/sarw/rorw can't be avoided either). Similarly, icc on the testcase > doesn't emit any QImode instructions at all, while gcc emits tons of them > and llvm something in between. > > So perhaps this bug is not about LRA, but about instruction selection, and > when not optimizing for size at least on some CPUs we should consider using > SImode arithmetics instead of QImode/HImode much more aggressively than we > do now. > Not sure if it is better done by (Kai's?) type optimization pass, which > shortly before expansion using target hints would just try to get rid of as > many QImode and especially HImode operations as possible, guess we can often > keep complete garbage in the upper bits, or if it is better done at the *.md > level. Please note that it is possible to tune usage of HImode and QImode arithmetics with X86_TUNE_QIMODE_MATH and X86_TUNE_HIMODE_MATH. Also, X86_TUNE_PROMOTE_QI_REGS, X86_TUNE_PROMOTE_QI_REGS and eventually X86_TUNE_PARTIAL_REG_STALL can be used to fine-tune usage of partial registers.