On Mon, Aug 17, 2020 at 5:34 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> >   Enable direct move between masks and gprs in pass_reload with
> > consideration of cost model.
> >
> > Changelog
> > gcc/
> >         * config/i386/i386.c (inline_secondary_memory_needed):
> >         No memory is needed between mask regs and gpr.
> >         (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
> >         mask regno.
> >         * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
> >         (REG_CLASS_NAMES): Ditto.
> >         (REG_CLASS_CONTENTS): Ditto.
> >         * config/i386/i386.md: Exclude mask register in
> >         define_peephole2 which is available only for gpr.
> >
> > gcc/testsuites/
> >         * gcc.target/i386/pr71453-1.c: New tests.
> >         * gcc.target/i386/pr71453-2.c: Ditto.
> >         * gcc.target/i386/pr71453-3.c: Ditto.
> >         * gcc.target/i386/pr71453-4.c: Ditto.
>
> @@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode
> mode, reg_class_t class1,
>        || MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
>        || MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
>        || MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
> -      || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
> -      || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
> -      || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
> +      || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
>      {
>        gcc_assert (!strict || lra_in_progress);
>        return true;
>
> No, this is still needed, the reason is explained in the comment above

Remove my change here.

> inline_secondary_memory_needed:
>
>    The function can't work reliably when one of the CLASSES is a class
>    containing registers from multiple sets.  We avoid this by never combining
>    different sets in a single alternative in the machine description.
>    Ensure that this constraint holds to avoid unexpected surprises.
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b24a4557871..74d207c3711 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -15051,7 +15051,7 @@
>     (parallel [(set (reg:CC FLAGS_REG)
>             (unspec:CC [(match_dup 0)] UNSPEC_PARITY))
>            (clobber (match_dup 0))])]
> -  ""
> +  "!MASK_REGNO_P (REGNO (operands[0]))"
>    [(set (reg:CC FLAGS_REG)
>      (unspec:CC [(match_dup 1)] UNSPEC_PARITY))])
>
> @@ -15072,6 +15072,7 @@
>                 (label_ref (match_operand 5))
>                 (pc)))]
>    "REGNO (operands[2]) == REGNO (operands[3])
> +   && !MASK_REGNO_P (REGNO (operands[1]))
>     && peep2_reg_dead_p (3, operands[0])
>     && peep2_reg_dead_p (3, operands[2])
>     && peep2_regno_dead_p (4, FLAGS_REG)"
>

Changed for upper two define_peepholes.

> Actually, there are several (historic?) peephole2 patterns that assume

I didn't find those patterns.

I looked through i386.md, there are 3 cases
1. Since mask registers are only used for "mov/zero_extend/bitwise"
patterns, peephole2 patterns involving only other patterns are safe to
use general_registers.
2. For those peephole2 patterns containing both
mov/zero_extend/bitwise and other patterns, implicit conditions such
as match_dup used in other patterns will make them the same as case 1.
3. Peephole2 patterns are safe to use mask registers, since they would
be eliminated in output patterns.

> register_operand means only integer registers. Just change
> register_operand to general_reg_operand and eventually
> nonimmediate_operand to nonimmediate_gr_operand. Do not put additional
> predicates into insn predicate.



>
> Uros.

Update patch.

-- 
BR,
Hongtao
From 5a07403279622447cc6503a8dcc3c0cecb9ffcef Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao....@intel.com>
Date: Thu, 24 Oct 2019 11:13:00 +0800
Subject: [PATCH 3/4] According to instruction_tables.pdf

1. Set cost of movement inside mask registers a bit higher than gpr's.
2. Set cost of movement between mask register and gpr much higher than movement
   inside gpr, but still less equal than load/store.
3. Set cost of mask register load/store a bit higher than gpr load/store.
---
 gcc/config/i386/x86-tune-costs.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 256c84e364e..a782a9dd9e3 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1727,12 +1727,12 @@ struct processor_costs skylake_cost = {
   {8, 8, 8, 12, 24},			/* cost of storing SSE registers
 					   in 32,64,128,256 and 512-bit */
   6, 6,					/* SSE->integer and integer->SSE moves */
-  2, 2,				/* mask->integer and integer->mask moves */
-  {4, 4, 4},				/* cost of loading mask register
+  4, 6,				/* mask->integer and integer->mask moves */
+  {6, 6, 6},				/* cost of loading mask register
 					   in QImode, HImode, SImode.  */
-  {6, 6, 6},				/* cost if storing mask register
+  {8, 8, 8},				/* cost if storing mask register
 					   in QImode, HImode, SImode.  */
-  2,					/* cost of moving mask register.  */
+  3,					/* cost of moving mask register.  */
   /* End of register allocator costs.  */
   },
 
-- 
2.18.1

Reply via email to