Hi, Carl Love is working on a patch to add missing flavors of the vec_xst_be intrinsic and test cases to cover all flavors. He ran into a latent bug in swap optimization that this patch addresses. Swap optimization operates on the principle that a computation can have swaps removed if all permuting loads are accompanied by a swap, all permuting stores are accompanied by a swap, and the remaining vector computations are lane-insensitive or easy to adjust if lanes are swapped across doublewords.
A new problem that arises with vec_xl_be and vec_xst_be is that the same swap may accompany both a load and a store, so that removing that swap changes the semantics of the program. Suppose we have a vec_xl from *(a+b) followed by a vec_xst_be to *(c+d). The code at expand time then looks like: lxvd2x x,a,b xxswapd x,x stxvd2x x,c,d The first two instructions are generated by vec_xl, while the last is generated by vec_xst_be. Swap optimization removes the xxswapd because this sequence satisfies the rules, but now we have the same result as if the vec_xst_be were actually a vec_xst. To avoid this, this patch marks a computation as unoptimizable if it contains a swap that is both fed by a permuting load and feeds into a permuting store. Bootstrapped and tested on powerpc64le-unknown-linux-gnu for POWER8 with no regressions. Carl has verified this fixes the related problems in his test cases under development. Is this okay for trunk? Thanks, Bill 2017-12-19 Bill Schmidt <wschm...@linux.vnet.ibm.com> * config/rs6000/rs6000-p8swap.c (swap_feeds_both_load_and_store): New function. (rs6000_analyze_swaps): Mark a web unoptimizable if it contains a swap associated with both a load and a store. Index: gcc/config/rs6000/rs6000-p8swap.c =================================================================== --- gcc/config/rs6000/rs6000-p8swap.c (revision 255801) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -327,6 +327,38 @@ insn_is_swap_p (rtx insn) return 1; } +/* Return 1 iff UID, known to reference a swap, is both fed by a load + and a feeder of a store. */ +static unsigned int +swap_feeds_both_load_and_store (swap_web_entry *insn_entry) +{ + rtx insn = insn_entry->insn; + struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); + df_ref def, use; + struct df_link *link = 0; + rtx_insn *load = 0, *store = 0; + unsigned int fed_by_load = 0; + unsigned int feeds_store = 0; + + FOR_EACH_INSN_INFO_USE (use, insn_info) + { + link = DF_REF_CHAIN (use); + load = DF_REF_INSN (link->ref); + if (insn_is_load_p (load) && insn_is_swap_p (load)) + fed_by_load = 1; + } + + FOR_EACH_INSN_INFO_DEF (def, insn_info) + { + link = DF_REF_CHAIN (def); + store = DF_REF_INSN (link->ref); + if (insn_is_store_p (store) && insn_is_swap_p (store)) + feeds_store = 1; + } + + return fed_by_load & feeds_store; +} + /* Return TRUE if insn is a swap fed by a load from the constant pool. */ static bool const_load_sequence_p (swap_web_entry *insn_entry, rtx insn) @@ -2029,6 +2061,14 @@ rs6000_analyze_swaps (function *fun) && !insn_entry[i].is_swap && !insn_entry[i].is_swappable) root->web_not_optimizable = 1; + /* If we have a swap that is both fed by a permuting load + and a feeder of a permuting store, then the optimization + isn't appropriate. (Consider vec_xl followed by vec_xst_be.) */ + else if (insn_entry[i].is_swap && !insn_entry[i].is_load + && !insn_entry[i].is_store + && swap_feeds_both_load_and_store (&insn_entry[i])) + root->web_not_optimizable = 1; + /* If we have permuting loads or stores that are not accompanied by a register swap, the optimization isn't appropriate. */ else if (insn_entry[i].is_load && insn_entry[i].is_swap)