On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
>>> Silvermont processors have penalty for instructions having 4+ bytes of >>> prefixes (including escape >>> bytes in opcode). This situation happens when REX prefix is used in SSE4 >>> instructions. This >>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over >>> xmm8-xmm15 in those >>> instructions. I achieved it by adding new tuning flag and new alternatives >>> affected by tuning. >> >>> SSE4 instructions are not very widely used by GCC but I see some >>> significant gains caused by >>> this patch (tested on Avoton on -O3). >> >>> 2014-07-02 Ilya Enkovich <ilya.enkov...@intel.com> >> >>> * config/i386/constraints.md (Yr): New. >>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. >>> (REG_CLASS_NAMES): Likewise. >>> (REG_CLASS_CONTENTS): Likewise. >>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives >>> which use only NO_REX_SSE_REGS. >> >> You don't need to add alternatives, just change existing alternatives >> from "x" to "Yr". The allocator will handle reduced register set just >> fine. > > Hi, > > Thanks for review! > > My first patch version did such replacement. Performance results were > OK but I got into stability issues due to peephole2 pass. Peepholes > may exchange operands of instructions and ignore register restrictions > assuming all SSE registers are homogeneous. It caused unrecognized > instructions on some tests. I preferred to add a new alternative > instead of fixing peephole and possibly other similar problems. No, please rather fix the peephole2 patterns. It is just a matter of putting satisfies_constraint_Xx to their insn condition. In effect, peephole2 pass is nullifying your optimization. Also, RA is still free to allocate unwanted registers, even when prefixed with "?". Uros.