Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-26 Thread Mike Stoner via ffmpeg-devel
Hello, I’ve accounted for all feedback on this so far, I’m wondering if it is ready to be pushed upstream? Here are my results from ‘checkasm’ (lower is better): v210_unpack_c: 1636 v210_unpack_ssse3: 611 v210_unpack_avx: 601 v210_unpack_avx2: 423 I ran it 5 times and averaged the middle 3 resu

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-16 Thread Mike Stoner
Hello, I resent my AVX2 patch for v210 unpacking.  My first attempt didn't get picked up by the Patchwork list for some reason. I installed Linux on a Broadwell laptop to utilize James Darnley's checkasm patch for v210 decode.  The results are below.   AVX2 gets a nice boost from replacing SHUF

Re: [FFmpeg-devel] [PATCH] Revised ff_v210_planar_unpack AVX2

2019-03-12 Thread Mike Stoner
I am submitting another patch.  Please disregard this one. -Mike ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-06 Thread Mike Stoner
Thanks for the feedback.  You are right, I can use VPERMQ to free up a register.  I can also remove the PAND mask by doing PSLLD/PSRLD.  That eliminates the need for an x86-64 block. I tried the naive 'unrolled' version with no permute, and it was much slower, about the same as the AVX/SSSE3 co