On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote: > With the following diff: > > @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1 > movi v5.4s, 0x80, lsl #24 > .irp signed,1,0 > \signed: > - subs w3, w3, #4 > ld1 {v3.4s}, [x2], #16 > + subs w3, w3, #4 > fmul v3.4s, v3.4s, v0.s[0] > .if \signed > ld1 {v4.4s}, [x1], #16 > > I'm getting the following improvement: > > Before: Cortex A53 A72 A78 > quant_bands_signed_neon: 5661.0 2383.2 1113.2 > quant_bands_unsigned_neon: 5401.5 2067.8 811.8 > After: > quant_bands_signed_neon: 5402.5 2385.5 1090.0 > quant_bands_unsigned_neon: 5145.5 2067.8 809.5 > > No change on the A72 here, but apparently a (very) small improvement on the > A78, and a bigger improvement on the A53 as expected. > > If you don't mind these changes, we could land the change with that tweaked. > (I guess the numbers in the commit message could be re-measured, but I'm not > sure if they change enough to make much of a difference there, especially on > the cores you've measured on.) > > // Martin
I don't mind these changes, I'm perfectly fine with applying any improvements on top of the patch. The speeds on A78 and x13s did not change significantly, the initial benchmark values can be used. Krzysztof _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".