On Mon, Aug 17, 2020 at 5:34 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > Enable direct move between masks and gprs in pass_reload with > > consideration of cost model. > > > > Changelog > > gcc/ > > * config/i386/i386.c (inline_secondary_memory_needed): > > No memory is needed between mask regs and gpr. > > (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for > > mask regno. > > * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS. > > (REG_CLASS_NAMES): Ditto. > > (REG_CLASS_CONTENTS): Ditto. > > * config/i386/i386.md: Exclude mask register in > > define_peephole2 which is available only for gpr. > > > > gcc/testsuites/ > > * gcc.target/i386/pr71453-1.c: New tests. > > * gcc.target/i386/pr71453-2.c: Ditto. > > * gcc.target/i386/pr71453-3.c: Ditto. > > * gcc.target/i386/pr71453-4.c: Ditto. > > @@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode > mode, reg_class_t class1, > || MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1) > || MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2) > || MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1) > - || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2) > - || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1) > - || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2)) > + || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)) > { > gcc_assert (!strict || lra_in_progress); > return true; > > No, this is still needed, the reason is explained in the comment above
Remove my change here. > inline_secondary_memory_needed: > > The function can't work reliably when one of the CLASSES is a class > containing registers from multiple sets. We avoid this by never combining > different sets in a single alternative in the machine description. > Ensure that this constraint holds to avoid unexpected surprises. > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index b24a4557871..74d207c3711 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -15051,7 +15051,7 @@ > (parallel [(set (reg:CC FLAGS_REG) > (unspec:CC [(match_dup 0)] UNSPEC_PARITY)) > (clobber (match_dup 0))])] > - "" > + "!MASK_REGNO_P (REGNO (operands[0]))" > [(set (reg:CC FLAGS_REG) > (unspec:CC [(match_dup 1)] UNSPEC_PARITY))]) > > @@ -15072,6 +15072,7 @@ > (label_ref (match_operand 5)) > (pc)))] > "REGNO (operands[2]) == REGNO (operands[3]) > + && !MASK_REGNO_P (REGNO (operands[1])) > && peep2_reg_dead_p (3, operands[0]) > && peep2_reg_dead_p (3, operands[2]) > && peep2_regno_dead_p (4, FLAGS_REG)" > Changed for upper two define_peepholes. > Actually, there are several (historic?) peephole2 patterns that assume I didn't find those patterns. I looked through i386.md, there are 3 cases 1. Since mask registers are only used for "mov/zero_extend/bitwise" patterns, peephole2 patterns involving only other patterns are safe to use general_registers. 2. For those peephole2 patterns containing both mov/zero_extend/bitwise and other patterns, implicit conditions such as match_dup used in other patterns will make them the same as case 1. 3. Peephole2 patterns are safe to use mask registers, since they would be eliminated in output patterns. > register_operand means only integer registers. Just change > register_operand to general_reg_operand and eventually > nonimmediate_operand to nonimmediate_gr_operand. Do not put additional > predicates into insn predicate. > > Uros. Update patch. -- BR, Hongtao
From 5a07403279622447cc6503a8dcc3c0cecb9ffcef Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao....@intel.com> Date: Thu, 24 Oct 2019 11:13:00 +0800 Subject: [PATCH 3/4] According to instruction_tables.pdf 1. Set cost of movement inside mask registers a bit higher than gpr's. 2. Set cost of movement between mask register and gpr much higher than movement inside gpr, but still less equal than load/store. 3. Set cost of mask register load/store a bit higher than gpr load/store. --- gcc/config/i386/x86-tune-costs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 256c84e364e..a782a9dd9e3 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1727,12 +1727,12 @@ struct processor_costs skylake_cost = { {8, 8, 8, 12, 24}, /* cost of storing SSE registers in 32,64,128,256 and 512-bit */ 6, 6, /* SSE->integer and integer->SSE moves */ - 2, 2, /* mask->integer and integer->mask moves */ - {4, 4, 4}, /* cost of loading mask register + 4, 6, /* mask->integer and integer->mask moves */ + {6, 6, 6}, /* cost of loading mask register in QImode, HImode, SImode. */ - {6, 6, 6}, /* cost if storing mask register + {8, 8, 8}, /* cost if storing mask register in QImode, HImode, SImode. */ - 2, /* cost of moving mask register. */ + 3, /* cost of moving mask register. */ /* End of register allocator costs. */ }, -- 2.18.1