On 28 Apr 2022, at 21:50, Martin Storsjö wrote:

> [...]
> Compared with the previously applied (and reverted) patch, here, you 
> previously had "mov x17, #4". I guess that'd mean the function only ever 
> produced 8 output rows, while it now uses the real height parameter? Was this 
> change a no-op (height is always 8?) or was this another hidden bug in the 
> previous implementation?
>

Yes, this was another bug in a previous implementation which I've fixed in both 
of the newer versions.

>> [...]
>> +        sqxtun          v6.8b, v20.8h
>> +        sqxtun          v7.8b, v21.8h
>> +        st1             {v6.8b}, [ x0], x2
>> +        st1             {v7.8b}, [x16], x2
>> +        subs            x17, x17, #1
>
> This could be "subs w6, w6, #2" and you wouldn't need the lsr instruction at 
> all. And you could place the subs before the two st1 instructions to reduce 
> latency between them a little. (The same thing goes for moving subs further 
> away from the branch that uses its outcome in the previous patch too.) But as 
> this is just a reapply of a previously committed and reverted patch, I guess 
> it's fine this way too...

Will do before apply if you're fine with it, not too complex change.

> The patchset otherwise looks good to me, modulo the question about the 
> difference to the previous patchset above.

--
J. Dekker
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to