On Thu, Sep 10, 2015 at 1:41 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Currently the little-endian swap optimization is disabled for > computations that include vector permute instructions. We generate a > vperm in a variety of ways, and for the most general cases, we can't > replace a vperm with a swap-equivalent sequence without cost modeling. > However, the most common use of vperm is using an UNSPEC_VPERM where the > mask operand is loaded from the constant pool. For these cases, we can > optimize the computation provided that we change the loaded constant. > This patch recognizes these cases and provides the necessary special > handling. > > An abbreviated description of the general case being recognized is: > > (set (reg:DI A constant-pool-symbol)) > (set (reg:V16QI B (swap (mem:V16QI (reg:DI A))))) ; lxvd2x > (set (reg:V16QI C (swap (reg:V16QI B)))) ; xxpermdi > (set (reg D (unspec [(reg X)(reg Y)(reg:V16QI C)] UNSPEC_VPERM))) > > where "swap" is a vec_select operation that performs a doubleword swap. > We adjust the mask to be used and create a new constant pool entry for > it, with a new MEM used to load it. The new MEM is substituted into the > load instruction, after which cleanup is done on the dataflow > structures. > > I've added two new tests, one to verify that swaps are removed, and one > to verify that the vperm transformation produces correct results. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions. Is this ok for trunk?
Okay. Thanks, David