On 28 Apr 2022, at 21:50, Martin Storsjö wrote: > [...] > Compared with the previously applied (and reverted) patch, here, you > previously had "mov x17, #4". I guess that'd mean the function only ever > produced 8 output rows, while it now uses the real height parameter? Was this > change a no-op (height is always 8?) or was this another hidden bug in the > previous implementation? >
Yes, this was another bug in a previous implementation which I've fixed in both of the newer versions. >> [...] >> + sqxtun v6.8b, v20.8h >> + sqxtun v7.8b, v21.8h >> + st1 {v6.8b}, [ x0], x2 >> + st1 {v7.8b}, [x16], x2 >> + subs x17, x17, #1 > > This could be "subs w6, w6, #2" and you wouldn't need the lsr instruction at > all. And you could place the subs before the two st1 instructions to reduce > latency between them a little. (The same thing goes for moving subs further > away from the branch that uses its outcome in the previous patch too.) But as > this is just a reapply of a previously committed and reverted patch, I guess > it's fine this way too... Will do before apply if you're fine with it, not too complex change. > The patchset otherwise looks good to me, modulo the question about the > difference to the previous patchset above. -- J. Dekker _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".