Hi,

Carl Love is working on a patch to add missing flavors of the
vec_xst_be intrinsic and test cases to cover all flavors.  He ran
into a latent bug in swap optimization that this patch addresses.
Swap optimization operates on the principle that a computation can
have swaps removed if all permuting loads are accompanied by a swap,
all permuting stores are accompanied by a swap, and the remaining
vector computations are lane-insensitive or easy to adjust if lanes
are swapped across doublewords.

A new problem that arises with vec_xl_be and vec_xst_be is that the
same swap may accompany both a load and a store, so that removing
that swap changes the semantics of the program.  Suppose we have a
vec_xl from *(a+b) followed by a vec_xst_be to *(c+d).  The code at
expand time then looks like:

   lxvd2x x,a,b
   xxswapd x,x
   stxvd2x x,c,d

The first two instructions are generated by vec_xl, while the last
is generated by vec_xst_be.  Swap optimization removes the xxswapd
because this sequence satisfies the rules, but now we have the same
result as if the vec_xst_be were actually a vec_xst.

To avoid this, this patch marks a computation as unoptimizable if
it contains a swap that is both fed by a permuting load and feeds
into a permuting store.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu for POWER8
with no regressions.  Carl has verified this fixes the related
problems in his test cases under development.  Is this okay for
trunk?

Thanks,
Bill


2017-12-19  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        * config/rs6000/rs6000-p8swap.c (swap_feeds_both_load_and_store):
        New function.
        (rs6000_analyze_swaps): Mark a web unoptimizable if it contains a
        swap associated with both a load and a store.


Index: gcc/config/rs6000/rs6000-p8swap.c
===================================================================
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 255801)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -327,6 +327,38 @@ insn_is_swap_p (rtx insn)
   return 1;
 }
 
+/* Return 1 iff UID, known to reference a swap, is both fed by a load
+   and a feeder of a store.  */
+static unsigned int
+swap_feeds_both_load_and_store (swap_web_entry *insn_entry)
+{
+  rtx insn = insn_entry->insn;
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref def, use;
+  struct df_link *link = 0;
+  rtx_insn *load = 0, *store = 0;
+  unsigned int fed_by_load = 0;
+  unsigned int feeds_store = 0;
+
+  FOR_EACH_INSN_INFO_USE (use, insn_info)
+    {
+      link = DF_REF_CHAIN (use);
+      load = DF_REF_INSN (link->ref);
+      if (insn_is_load_p (load) && insn_is_swap_p (load))
+       fed_by_load = 1;
+    }
+
+  FOR_EACH_INSN_INFO_DEF (def, insn_info)
+    {
+      link = DF_REF_CHAIN (def);
+      store = DF_REF_INSN (link->ref);
+      if (insn_is_store_p (store) && insn_is_swap_p (store))
+       feeds_store = 1;
+    }
+
+  return fed_by_load & feeds_store;
+}
+
 /* Return TRUE if insn is a swap fed by a load from the constant pool.  */
 static bool
 const_load_sequence_p (swap_web_entry *insn_entry, rtx insn)
@@ -2029,6 +2061,14 @@ rs6000_analyze_swaps (function *fun)
          && !insn_entry[i].is_swap && !insn_entry[i].is_swappable)
        root->web_not_optimizable = 1;
 
+      /* If we have a swap that is both fed by a permuting load
+        and a feeder of a permuting store, then the optimization
+        isn't appropriate.  (Consider vec_xl followed by vec_xst_be.)  */
+      else if (insn_entry[i].is_swap && !insn_entry[i].is_load
+              && !insn_entry[i].is_store
+              && swap_feeds_both_load_and_store (&insn_entry[i]))
+       root->web_not_optimizable = 1;
+
       /* If we have permuting loads or stores that are not accompanied
         by a register swap, the optimization isn't appropriate.  */
       else if (insn_entry[i].is_load && insn_entry[i].is_swap)

Reply via email to