On Thu, Sep 10, 2015 at 1:41 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> Currently the little-endian swap optimization is disabled for
> computations that include vector permute instructions.  We generate a
> vperm in a variety of ways, and for the most general cases, we can't
> replace a vperm with a swap-equivalent sequence without cost modeling.
> However, the most common use of vperm is using an UNSPEC_VPERM where the
> mask operand is loaded from the constant pool.  For these cases, we can
> optimize the computation provided that we change the loaded constant.
> This patch recognizes these cases and provides the necessary special
> handling.
>
> An abbreviated description of the general case being recognized is:
>
>   (set (reg:DI A constant-pool-symbol))
>   (set (reg:V16QI B (swap (mem:V16QI (reg:DI A)))))   ; lxvd2x
>   (set (reg:V16QI C (swap (reg:V16QI B))))            ; xxpermdi
>   (set (reg D (unspec [(reg X)(reg Y)(reg:V16QI C)] UNSPEC_VPERM)))
>
> where "swap" is a vec_select operation that performs a doubleword swap.
> We adjust the mask to be used and create a new constant pool entry for
> it, with a new MEM used to load it.  The new MEM is substituted into the
> load instruction, after which cleanup is done on the dataflow
> structures.
>
> I've added two new tests, one to verify that swaps are removed, and one
> to verify that the vperm transformation produces correct results.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?

Okay.

Thanks, David

Reply via email to