Aug 13, 2020, 18:23 by one...@gmail.com:

> Hi,
>
> patch attached.
>
> Please review and/or benchmark, especially .asm file.
>

I took a look. Its just the horizontal pass of an inverse 2-6 idwt with 
clipping.
The code is so simple I wasn't able to find any obvious ways to improve it,
except perhaps replacing the "mov xq, 0" with "xor xq, xq", since I think
xor is more universally recognized by x86 CPUs as "zeroing a register" so it'll
just allocate a pre-zeroed one. I could be wrong though, its what everyone uses.
Maybe call it idwt_26_horiz instead of a vague horiz_filter, since that's what 
it is?

Its also called on a per-line basis in a loop with 1 call, and 3 adds 
everywhere.
You could easily incorporate the loop into the function to reduce call
overhead if you want to (and I think you should look into it, but I won't block
the patch just for that). Registers might be a tight fit on 32-bit systems then,
but even using the stack should be faster than a hot function call.

Aside from those nitpicks, LGTM.

SIMDing the remaining DSP function (interlaced_vertical_filter) should help a 
lot
too, though that function is pretty much trivial, since its just an average + 
deinterleave.
That function should 100% have its 3-line loop incorporated into it, however, 
as you'll
definitely have no shortage of registers, even on 32bit systems.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to