Hi, I updated the patch after the dot product went in. This is the new covet letter:
This patch adds support to vectorize sum of abslolute differences (SAD_EXPR) using SVE. Given this input code: int sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += __builtin_abs (x[i] - y[i]); } return sum; } The resulting SVE code is: 0000000000000000 <sum_abs>: 0: 7100005f cmp w2, #0x0 4: 5400026d b.le 50 <sum_abs+0x50> 8: d2800003 mov x3, #0x0 // #0 c: 93407c42 sxtw x2, w2 10: 2538c002 mov z2.b, #0 14: 25221fe0 whilelo p0.b, xzr, x2 18: 2538c023 mov z3.b, #1 1c: 2518e3e1 ptrue p1.b 20: a4034000 ld1b {z0.b}, p0/z, [x0, x3] 24: a4034021 ld1b {z1.b}, p0/z, [x1, x3] 28: 0430e3e3 incb x3 2c: 0520c021 sel z1.b, p0, z1.b, z0.b 30: 25221c60 whilelo p0.b, x3, x2 34: 040d0420 uabd z0.b, p1/m, z0.b, z1.b 38: 44830402 udot z2.s, z0.b, z3.b 3c: 54ffff21 b.ne 20 <sum_abs+0x20> // b.any 40: 2598e3e0 ptrue p0.s 44: 04812042 uaddv d2, p0, z2.s 48: 1e260040 fmov w0, s2 4c: d65f03c0 ret 50: 1e2703e2 fmov s2, wzr 54: 1e260040 fmov w0, s2 58: d65f03c0 ret Notice how udot is used inside a fully masked loop. I tested this patch in an aarch64 machine bootstrapping the compiler and running the checks. Alejandro gcc/Changelog: 2019-05-07 Alejandro Martinez <alejandro.martinezvice...@arm.com> * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand. (aarch64_<su>abd<mode>_3): Likewise. (*aarch64_<su>abd<mode>_3): New define_insn. (<sur>sad<vsi2qi>): New define_expand. * config/aarch64/iterators.md: Added MAX_OPP attribute. * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR. (build_vect_cond_expr): Likewise. gcc/testsuite/Changelog: 2019-05-07 Alejandro Martinez <alejandro.martinezvice...@arm.com> * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute differences. > -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> > On Behalf Of Alejandro Martinez Vicente > Sent: 11 February 2019 15:38 > To: James Greenhalgh <james.greenha...@arm.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Richard > Sandiford <richard.sandif...@arm.com>; Richard Biener > <richard.guent...@gmail.com> > Subject: RE: [Aarch64][SVE] Vectorise sum-of-absolute-differences > > > -----Original Message----- > > From: James Greenhalgh <james.greenha...@arm.com> > > Sent: 06 February 2019 17:42 > > To: Alejandro Martinez Vicente <alejandro.martinezvice...@arm.com> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Richard > > Sandiford <richard.sandif...@arm.com>; Richard Biener > > <richard.guent...@gmail.com> > > Subject: Re: [Aarch64][SVE] Vectorise sum-of-absolute-differences > > > > On Mon, Feb 04, 2019 at 07:34:05AM -0600, Alejandro Martinez Vicente > > wrote: > > > Hi, > > > > > > This patch adds support to vectorize sum of absolute differences > > > (SAD_EXPR) using SVE. It also uses the new functionality to ensure > > > that the resulting loop is masked. Therefore, it depends on > > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html > > > > > > Given this input code: > > > > > > int > > > sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) { > > > int sum = 0; > > > > > > for (int i = 0; i < n; i++) > > > { > > > sum += __builtin_abs (x[i] - y[i]); > > > } > > > > > > return sum; > > > } > > > > > > The resulting SVE code is: > > > > > > 0000000000000000 <sum_abs>: > > > 0: 7100005f cmp w2, #0x0 > > > 4: 5400026d b.le 50 <sum_abs+0x50> > > > 8: d2800003 mov x3, #0x0 // #0 > > > c: 93407c42 sxtw x2, w2 > > > 10: 2538c002 mov z2.b, #0 > > > 14: 25221fe0 whilelo p0.b, xzr, x2 > > > 18: 2538c023 mov z3.b, #1 > > > 1c: 2518e3e1 ptrue p1.b > > > 20: a4034000 ld1b {z0.b}, p0/z, [x0, x3] > > > 24: a4034021 ld1b {z1.b}, p0/z, [x1, x3] > > > 28: 0430e3e3 incb x3 > > > 2c: 0520c021 sel z1.b, p0, z1.b, z0.b > > > 30: 25221c60 whilelo p0.b, x3, x2 > > > 34: 040d0420 uabd z0.b, p1/m, z0.b, z1.b > > > 38: 44830402 udot z2.s, z0.b, z3.b > > > 3c: 54ffff21 b.ne 20 <sum_abs+0x20> // b.any > > > 40: 2598e3e0 ptrue p0.s > > > 44: 04812042 uaddv d2, p0, z2.s > > > 48: 1e260040 fmov w0, s2 > > > 4c: d65f03c0 ret > > > 50: 1e2703e2 fmov s2, wzr > > > 54: 1e260040 fmov w0, s2 > > > 58: d65f03c0 ret > > > > > > Notice how udot is used inside a fully masked loop. > > > > > > I tested this patch in an aarch64 machine bootstrapping the compiler > > > and running the checks. > > > > This doesn't give us much confidence in SVE coverage; unless you have > > been running in an environment using SVE by default? Do you have some > > set of workloads you could test the compiler against to ensure correct > > operation of the SVE vectorization? > > > I tested it using an SVE model and a big set of workloads, including SPEC > 2000, > 2006 and 2017. On the plus side, nothing got broken. But impact on > performance was very minimal (on average, a tiny gain over the whole set of > workloads). > > I still want this patch (and the companion dot product patch) to make into > the compiler because they are the first steps towards vectorising workloads > using fully masked loops when the target ISA (like SVE) doesn't support > masking in all the operations. > > Alejandro > > > > > > > I admit it is too late to merge this into gcc 9, but I'm posting it > > > anyway so it can be considered for gcc 10. > > > > Richard Sandiford has the call on whether this patch is OK for trunk > > now or GCC 10. With the minimal testing it has had, I'd be > > uncomfortable with it as a GCC 9 patch. That said, it is a fairly > > self-contained pattern for the compiler and it would be good to see this > optimization in GCC 9. > > > > > > > > Alejandro > > > > > > > > > gcc/Changelog: > > > > > > 2019-02-04 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > > > > > * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New > > define_expand. > > > (aarch64_<su>abd<mode>_3): Likewise. > > > (*aarch64_<su>abd<mode>_3): New define_insn. > > > (<sur>sad<vsi2qi>): New define_expand. > > > * config/aarch64/iterators.md: Added MAX_OPP and max_opp > > attributes. > > > Added USMAX iterator. > > > * config/aarch64/predicates.md: Added aarch64_smin and > > aarch64_umin > > > predicates. > > > * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR. > > > (build_vect_cond_expr): Likewise. > > > > > > gcc/testsuite/Changelog: > > > > > > 2019-02-04 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > > > > > * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute > > > differences. > >
sad_v3.patch
Description: sad_v3.patch