RE: [Aarch64][SVE] Vectorise sum-of-absolute-differences

Alejandro Martinez Vicente Mon, 11 Feb 2019 07:38:11 -0800

> -----Original Message-----
> From: James Greenhalgh <james.greenha...@arm.com>
> Sent: 06 February 2019 17:42
> To: Alejandro Martinez Vicente <alejandro.martinezvice...@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Richard
> Sandiford <richard.sandif...@arm.com>; Richard Biener
> <richard.guent...@gmail.com>
> Subject: Re: [Aarch64][SVE] Vectorise sum-of-absolute-differences
> 
> On Mon, Feb 04, 2019 at 07:34:05AM -0600, Alejandro Martinez Vicente
> wrote:
> > Hi,
> >
> > This patch adds support to vectorize sum of absolute differences
> > (SAD_EXPR) using SVE. It also uses the new functionality to ensure
> > that the resulting loop is masked. Therefore, it depends on
> >
> > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html
> >
> > Given this input code:
> >
> > int
> > sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) {
> >   int sum = 0;
> >
> >   for (int i = 0; i < n; i++)
> >     {
> >       sum += __builtin_abs (x[i] - y[i]);
> >     }
> >
> >   return sum;
> > }
> >
> > The resulting SVE code is:
> >
> > 0000000000000000 <sum_abs>:
> >    0:       7100005f        cmp     w2, #0x0
> >    4:       5400026d        b.le    50 <sum_abs+0x50>
> >    8:       d2800003        mov     x3, #0x0                        // #0
> >    c:       93407c42        sxtw    x2, w2
> >   10:       2538c002        mov     z2.b, #0
> >   14:       25221fe0        whilelo p0.b, xzr, x2
> >   18:       2538c023        mov     z3.b, #1
> >   1c:       2518e3e1        ptrue   p1.b
> >   20:       a4034000        ld1b    {z0.b}, p0/z, [x0, x3]
> >   24:       a4034021        ld1b    {z1.b}, p0/z, [x1, x3]
> >   28:       0430e3e3        incb    x3
> >   2c:       0520c021        sel     z1.b, p0, z1.b, z0.b
> >   30:       25221c60        whilelo p0.b, x3, x2
> >   34:       040d0420        uabd    z0.b, p1/m, z0.b, z1.b
> >   38:       44830402        udot    z2.s, z0.b, z3.b
> >   3c:       54ffff21        b.ne    20 <sum_abs+0x20>  // b.any
> >   40:       2598e3e0        ptrue   p0.s
> >   44:       04812042        uaddv   d2, p0, z2.s
> >   48:       1e260040        fmov    w0, s2
> >   4c:       d65f03c0        ret
> >   50:       1e2703e2        fmov    s2, wzr
> >   54:       1e260040        fmov    w0, s2
> >   58:       d65f03c0        ret
> >
> > Notice how udot is used inside a fully masked loop.
> >
> > I tested this patch in an aarch64 machine bootstrapping the compiler
> > and running the checks.
> 
> This doesn't give us much confidence in SVE coverage; unless you have been
> running in an environment using SVE by default? Do you have some set of
> workloads you could test the compiler against to ensure correct operation of
> the SVE vectorization?
> 
I tested it using an SVE model and a big set of workloads, including SPEC 2000,
2006 and 2017. On the plus side, nothing got broken. But impact on performance
was very minimal (on average, a tiny gain over the whole set of workloads).


I still want this patch (and the companion dot product patch) to make into the
compiler because they are the first steps towards vectorising workloads using
fully masked loops when the target ISA (like SVE) doesn't support masking in
all the operations.

Alejandro

> >
> > I admit it is too late to merge this into gcc 9, but I'm posting it
> > anyway so it can be considered for gcc 10.
> 
> Richard Sandiford has the call on whether this patch is OK for trunk now or
> GCC 10. With the minimal testing it has had, I'd be uncomfortable with it as a
> GCC 9 patch. That said, it is a fairly self-contained pattern for the compiler
> and it would be good to see this optimization in GCC 9.
> 
> >
> > Alejandro
> >
> >
> > gcc/Changelog:
> >
> > 2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>
> >
> >     * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New
> define_expand.
> >     (aarch64_<su>abd<mode>_3): Likewise.
> >     (*aarch64_<su>abd<mode>_3): New define_insn.
> >     (<sur>sad<vsi2qi>): New define_expand.
> >     * config/aarch64/iterators.md: Added MAX_OPP and max_opp
> attributes.
> >     Added USMAX iterator.
> >     * config/aarch64/predicates.md: Added aarch64_smin and
> aarch64_umin
> >     predicates.
> >     * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR.
> >     (build_vect_cond_expr): Likewise.
> >
> > gcc/testsuite/Changelog:
> >
> > 2019-02-04  Alejandro Martinez  <alejandro.martinezvice...@arm.com>
> >
> >     * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute
> >     differences.
>

RE: [Aarch64][SVE] Vectorise sum-of-absolute-differences

Reply via email to