> -----Original Message-----
> From: Christopher Bazley <[email protected]>
> Sent: 19 December 2025 15:09
> To: [email protected]
> Cc: [email protected]; Tamar Christina
> <[email protected]>
> Subject: [PATCH v8 07/10] AArch64/SVE: Relax the expectations of the
> popcnt-sve test
>
> When predicated tails are enabled for basic block SLP vectorization,
> the assembly language generated by GCC when compiling popcnt-sve.c
> will change. Relax the regular expressions used by this test in
> preparation.
>
> Currently, analysis of f_v8hi succeeds with vector mode V16QI and the
> following GIMPLE is produced:
>
> vector(8) short unsigned intD.19 vect__1.18D.4648;
> ...
> vect__1.18_69 = MEM <vector(8) short unsigned intD.19>
> [(short unsigned intD.19 *)vectp.17_68 clique 1 base 1];
> vect_patt_60.19_70 = .POPCOUNT (vect__1.18_69);
>
> With predicated tails, analysis instead succeeds with a variable-length
> vector mode and the following GIMPLE is produced:
>
> vector([8,8]) short unsigned intD.19 vect__1.18D.4649;
> ...
> slp_mask_45 = .WHILE_ULT (0, 8, { 0, ... }); # VUSE <.MEM_25(D)>
> vect__1.18_46 = .MASK_LOAD (vectp.17_44, 16B, slp_mask_45, { 0, ... });
> vect_patt_36.19_47 = .POPCOUNT (vect__1.18_46);
>
> When lowered to RTL, the WHILE_ULT is replaced by
> reinterpretation of a V16QI as VNx8HI:
>
> (insn 7 4 8 2 (
> set (reg:V16QI 107) (mem:V16QI (reg/v/f:DI 103 [ b ]) [1 S16 A16])
> ) "gcc.target/aarch64/popcnt-sve.c":33:8 discrim 1 -1 (nil))
>
> (insn 8 7 9 2 (
> set (reg:VNx8HI 106) (subreg:VNx8HI (reg:V16QI 107) 0))
> "gcc.target/aarch64/popcnt-sve.c":33:8 discrim 1 -1 (nil))
>
> A mask is still required to lower POPCOUNT, so an all-ones mask
> is synthesized:
>
> (insn 9 8 10 2 (set (reg:VNx16BI 108)
> (const_vector:VNx16BI repeat [(const_int 1 [0x1])
> ])) "gcc.target/aarch64/popcnt-sve.c":69:8 discrim 1 -1
> (nil))
>
> (insn 10 9 11 2 (set (reg:VNx4SI 105)
> (unspec:VNx4SI [
> (subreg:VNx4BI (reg:VNx16BI 108) 0)
> (popcount:VNx4SI (reg:VNx4SI 106))
> ] UNSPEC_PRED_X))
> "gcc.target/aarch64/popcnt-sve.c":69:8 discrim 1 -1
> (nil))
>
> However, this mask is not the same as the specific-width mask
> currently expected by the tests.
Patch is OK and should be an improvement.
It's indeed safe because the else branch of the load explicitly states
the lanes are zero.
Thanks,
Tamar
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/popcnt-sve.c: Update test expectations
> to allow both current and alternative valid mask
> specifications.
>
> ---
> gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
> index c3b4c69b4b4..117a5ca8f1b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
> @@ -4,7 +4,7 @@
>
> /*
> ** f_v4hi:
> -** ptrue (p[0-7]).b, vl8
> +** ptrue (p[0-7]).b, (?:vl8|all)
> ** ldr d([0-9]+), \[x0\]
> ** cnt z\2.h, \1/m, z\2.h
> ** str d\2, \[x1\]
> @@ -21,7 +21,7 @@ f_v4hi (unsigned short *__restrict b, unsigned short
> *__restrict d)
>
> /*
> ** f_v8hi:
> -** ptrue (p[0-7]).b, vl16
> +** ptrue (p[0-7]).b, (?:vl16|all)
> ** ldr q([0-9]+), \[x0\]
> ** cnt z\2.h, \1/m, z\2.h
> ** str q\2, \[x1\]
> @@ -42,7 +42,7 @@ f_v8hi (unsigned short *__restrict b, unsigned short
> *__restrict d)
>
> /*
> ** f_v2si:
> -** ptrue (p[0-7]).b, vl8
> +** ptrue (p[0-7]).b, (?:vl8|all)
> ** ldr d([0-9]+), \[x0\]
> ** cnt z\2.s, \1/m, z\2.s
> ** str d\2, \[x1\]
> @@ -57,7 +57,7 @@ f_v2si (unsigned int *__restrict b, unsigned int
> *__restrict d)
>
> /*
> ** f_v4si:
> -** ptrue (p[0-7]).b, vl16
> +** ptrue (p[0-7]).b, (?:vl16|all)
> ** ldr q([0-9]+), \[x0\]
> ** cnt z\2.s, \1/m, z\2.s
> ** str q\2, \[x1\]
> @@ -74,7 +74,7 @@ f_v4si (unsigned int *__restrict b, unsigned int
> *__restrict d)
>
> /*
> ** f_v2di:
> -** ptrue (p[0-7]).b, vl16
> +** ptrue (p[0-7]).b, (?:vl16|all)
> ** ldr q([0-9]+), \[x0\]
> ** cnt z\2.d, \1/m, z\2.d
> ** str q\2, \[x1\]
> --
> 2.43.0