Thanks for calling that out. It looks like I was cross-compiling for 32-bit incorrectly from my 64-bit host. I've reproduced the failure and submitted a v2 with the fix. If you're still seeing build failures even after v2, can you also provide more details on how you are building so I can reproduce and fix? - Chris
On Wed, Jul 20, 2022 at 6:17 AM Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote: > > Add a new version of yadif_filter_line performed using packed bytes > > instead of the packed words used by the current implementaiton. As > > a result this implementation runs almost 2x as fast as the current > > fastest SSSE3 implementation. > > > > This implementation is created from scratch based on the C code, with > > the goal of keeping all intermediate values within 8-bits so that > > the vectorized code can be computed using packed bytes. differences > > are as follows: > > - Use algorithms to compute avg and abs difference using only 8-bit > > intermediate values. > > - Reworked the mode 1 code by applying various mathematical identities > > to keep all intermediate values within 8-bits. > > - Attempt to compute the spatial score using only 8-bits. The actual > > spatial score fits within this range 97% (content dependent) of the > > time for the entire 128-bit xmm vector. In the case that spatial > > score needs more than 8-bits to be represented, we detect this case, > > and recompute the spatial score using 16-bit packed words instead. > > > > In 3% of cases the spatial_score will need more than 8-bytes to store > > so we have a slow path, where the spatial score is computed using > > packed words instead. > > > > This implementation is currently limited to x86_64 due to the number > > of registers required. x86_32 is possible, but the performance benefit > > over the existing SSSE3 implentation is not as great, due to all of the > > stack spills that would result from having far fewer registers. ASM was > > not generated for the 32-bit varient due to limited ROI, as most AVX > > users are likely on 64-bit OS at this point and 32-bit users would > > lose out on most of the performance benefit. > > > > Signed-off-by: Chris Phlipot <cphlip...@gmail.com> > > theres no need to support 32it but ffmpeg build must not break > on linux x86-32 > > src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of > address sizes > src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address > src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of > address sizes > src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here > src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined > here > src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here > > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Everything should be made as simple as possible, but not simpler. > -- Albert Einstein > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".