Hi Richard, I assume that this fix does not affect on code size since such pattern happens very rare although I can add a check on it if you insist. Register pressure is not a issue here since I assume that additional fill won't affect on performance as cmove with memory operand. I decided to not change machine description of cmov (prohibit memory operand0 since it is very risky but only did a simple change to not produce such instruction during forward substitution (aka combine) phase.
Best regards. Yuri. 2012/12/12 Richard Biener <richard.guent...@gmail.com>: > On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi Uros, >> >> This fix is for all x86 platforms, we tested it on core2/corei7, >> atom/atom2 and AMD and got performance improvement +6% -- +11%. So I >> don' think we need to introduce additioanl tune feature. >> >> Sorry for my typo with gcc version - I ment mainline only since 4.7 >> does not use LRA. > > Btw, if it's a performance improvement all over the board why not adjust > the patterns itself to not allow memory operands? Thus, why restrict it > to combine not creating this instruction? (My -Os comment still stands) > > Richard. > >> Thanks. >> Yuri. >> >> >> >> 2012/12/12 Uros Bizjak <ubiz...@gmail.com>: >>> On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>> wrote: >>> >>>> This fix is aimed to remove performance degradation introduced by new >>>> LRA phase that in fact is combining problem. Gcc combiner does >>>> propagation of memory load to if-then-else gimple that was splitted >>>> back by old reload phase. LRA does not perform such splitting. To >>>> avoid performance slowdown on important benchmark (this is true for >>>> all x86 targets) we decided to enhance 'ix86_legitimate_combined_insn' >>>> with a check on such propagation and consider such conditional >>>> instruction with memory operand as illegal one from performance point >>>> of view. >>> >>> Is this true for all x86 targets? I have no objections to the >>> implementation, but these fine-tunings should be declared in >>> ix86_tune_features[] array, and used as conditions involving >>> TARGET_xxx in the code. Please see many examples in the i386 source >>> dir. >>> >>>> Is it OK for 4.8 and mainline? >>> >>> Hm, currently 4.8 _is_ mainline. Did you mean 4.7? >>> >>> Thanks, >>> Uros.