Hello,
I’ve accounted for all feedback on this so far, I’m wondering if it is ready to
be pushed upstream?
Here are my results from ‘checkasm’ (lower is better):
v210_unpack_c: 1636
v210_unpack_ssse3: 611
v210_unpack_avx: 601
v210_unpack_avx2: 423
I ran it 5 times and averaged the middle 3 resu
Hello,
I resent my AVX2 patch for v210 unpacking. My first attempt didn't get picked
up by the Patchwork list for some reason.
I installed Linux on a Broadwell laptop to utilize James Darnley's checkasm
patch for v210 decode. The results are below.
AVX2 gets a nice boost from replacing SHUF
I am submitting another patch. Please disregard this one.
-Mike
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Thanks for the feedback. You are right, I can use VPERMQ to free up a
register. I can also remove the PAND mask by doing PSLLD/PSRLD. That
eliminates the need for an x86-64 block.
I tried the naive 'unrolled' version with no permute, and it was much slower,
about the same as the AVX/SSSE3 co