Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Biener <rguent...@suse.de> >> Sent: Wednesday, April 23, 2025 9:31 AM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Sandiford >> <richard.sandif...@arm.com> >> Subject: Re: [PATCH]middle-end: Add new "max" vector cost model >> >> On Wed, 23 Apr 2025, Tamar Christina wrote: >> >> > Hi All, >> > >> > This patch proposes a new vector cost model called "max". The cost model >> > is an >> > intersection between two of our existing cost models. Like `unlimited` it >> > disables the costing vs scalar and assumes all vectorization to be >> > profitable. >> > >> > But unlike unlimited it does not fully disable the vector cost model. That >> > means that we still perform comparisons between vector modes. >> > >> > As an example, the following: >> > >> > void >> > foo (char *restrict a, int *restrict b, int *restrict c, >> > int *restrict d, int stride) >> > { >> > if (stride <= 1) >> > return; >> > >> > for (int i = 0; i < 3; i++) >> > { >> > int res = c[i]; >> > int t = b[i * stride]; >> > if (a[i] != 0) >> > res = t * d[i]; >> > c[i] = res; >> > } >> > } >> > >> > compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to >> > vectorize as it assumes scalar would be faster, and with >> > -fvect-cost-model=unlimited it picks a vector type that's so big that the >> > large >> > sequence generated is working on mostly inactive lanes: >> > >> > ... >> > and p3.b, p3/z, p4.b, p4.b >> > whilelo p0.s, wzr, w7 >> > ld1w z23.s, p3/z, [x3, #3, mul vl] >> > ld1w z28.s, p0/z, [x5, z31.s, sxtw 2] >> > add x0, x5, x0 >> > punpklo p6.h, p6.b >> > ld1w z27.s, p4/z, [x0, z31.s, sxtw 2] >> > and p6.b, p6/z, p0.b, p0.b >> > punpklo p4.h, p7.b >> > ld1w z24.s, p6/z, [x3, #2, mul vl] >> > and p4.b, p4/z, p2.b, p2.b >> > uqdecw w6 >> > ld1w z26.s, p4/z, [x3] >> > whilelo p1.s, wzr, w6 >> > mul z27.s, p5/m, z27.s, z23.s >> > ld1w z29.s, p1/z, [x4, z31.s, sxtw 2] >> > punpkhi p7.h, p7.b >> > mul z24.s, p5/m, z24.s, z28.s >> > and p7.b, p7/z, p1.b, p1.b >> > mul z26.s, p5/m, z26.s, z30.s >> > ld1w z25.s, p7/z, [x3, #1, mul vl] >> > st1w z27.s, p3, [x2, #3, mul vl] >> > mul z25.s, p5/m, z25.s, z29.s >> > st1w z24.s, p6, [x2, #2, mul vl] >> > st1w z25.s, p7, [x2, #1, mul vl] >> > st1w z26.s, p4, [x2] >> > ... >> > >> > With -fvect-cost-model=max you get more reasonable code: >> > >> > foo: >> > cmp w4, 1 >> > ble .L1 >> > ptrue p7.s, vl3 >> > index z0.s, #0, w4 >> > ld1b z29.s, p7/z, [x0] >> > ld1w z30.s, p7/z, [x1, z0.s, sxtw 2] >> > ptrue p6.b, all >> > cmpne p7.b, p7/z, z29.b, #0 >> > ld1w z31.s, p7/z, [x3] >> > mul z31.s, p6/m, z31.s, z30.s >> > st1w z31.s, p7, [x2] >> > .L1: >> > ret >> > >> > This model has been useful internally for performance exploration and cost- >> model >> > validation. It allows us to force realistic vectorization overriding the >> > cost >> > model to be able to tell whether it's correct wrt to profitability. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu, >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu >> > -m32, -m64 and no issues. >> > >> > Ok for master? >> >> Hmm. I don't like another cost model. Instead how about changing >> 'unlimited' to still iterate through vector sizes? Cost modeling >> is really about vector vs. scalar, not vector vs. vector which is >> completely under target control. Targets should provide a way >> to limit iteration, like aarch64 has with the aarch64-autovec-preference >> --param or x86 has with -mprefer-vector-width. >> > > I'm ok with changing 'unlimited' if that's preferred, but I do want to point > out that we don't have enough control with current --param or -m options > to simulate all cases. > > For instance for SVE there's way for us to force a smaller type to be used > and thus force an unpacking to happen. Or there's no way to force an > unrolling with Adv. SIMD. > > Basically there's not enough control over the VF to exercise some tests > reliably. Some tests explicitly relied on unlimited just picking the first > mode.
FWIW, adding extra AArch64 --params sounds ok to me. The ones we have were just added on an as-needed/as-wanted basis, rather than as an attempt to be complete. After the aarch64-autovec-preference backward-compatibility controversy, we should consider whether what we add is something that is intended for developers and can be taken away at any time (--param), or whether it's something that we promise to support going forward (-m). Thanks, Richard