On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > Here is a fix for PR 62011 - remove false dependency for unary > bit-manipulation instructions for latest BigCore chips (Sandybridge > and Haswell) by outputting in assembly file zeroing destination > register before bmi instruction. I checked that performance restored > for popcnt, lzcnt and tzcnt instructions. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? > > gcc/ChangeLog > 2014-08-14 Yuri Rumyantsev <ysrum...@gmail.com> > > PR target/62011 > * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function > prototype. > * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. > * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. > * config/i386/i386.md (ctz<mode>2, clz<mode>2_lzcnt, popcount<mode>2, > *popcount<mode>2_cmp, *popcountsi2_cmp_zext): Output zeroing > destination register for unary bit-manipulation instructions > if required.
Why don't you use splitter to to generate XOR? > * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. Is this needed for r16 and r32? The original report says that only r64 is affected: http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance Have you tried this on Silvermont? Does it help Silvermont? -- H.J.