Apologies for this: when I added mmx to the yasm file, I added a macro for the stores selecting mova for mmx and movdqu for the others. if cpuflag(mmx) evaluates to true for all architectures so I replaced it with if notcpuflag(sse3).
The alignment in the checkasm test has been changed to 8 from 32 so that the test catches problems with alignment. On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote: > > --- > > Fixes a bug where if there is no offset and a tail which is not > processed by the > > sse3/avx2 version the dither is modified > > Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it > > to yuv2yuvX.asm to reduce code duplication and so that it may be used > > to process the tail from the larger cardinal simd versions. > > src argument of yuv2yuvX_* is now srcOffset, so that tails and offsets > > are accounted for correctly. > > Changes input size in checkasm so that this corner case is tested. > > > > libswscale/x86/Makefile | 1 + > > libswscale/x86/swscale.c | 130 ++++++++++++---------------- > > libswscale/x86/swscale_template.c | 82 ------------------ > > libswscale/x86/yuv2yuvX.asm | 136 ++++++++++++++++++++++++++++++ > > tests/checkasm/sw_scale.c | 100 ++++++++++++++++++++++ > > 5 files changed, 291 insertions(+), 158 deletions(-) > > create mode 100644 libswscale/x86/yuv2yuvX.asm > > This seems to be crashing again unless i messed up testing > > (gdb) disassemble $rip-32,$rip+32 > Dump of assembler code from 0x555555572f02 to 0x555555572f42: > 0x0000555555572f02 <ff_yuv2yuvX_avx2+162>: int $0x71 > 0x0000555555572f04 <ff_yuv2yuvX_avx2+164>: out %al,$0x3 > 0x0000555555572f06 <ff_yuv2yuvX_avx2+166>: vpsraw $0x3,%ymm1,%ymm1 > 0x0000555555572f0b <ff_yuv2yuvX_avx2+171>: vpackuswb %ymm4,%ymm3,%ymm3 > 0x0000555555572f0f <ff_yuv2yuvX_avx2+175>: vpackuswb %ymm1,%ymm6,%ymm6 > 0x0000555555572f13 <ff_yuv2yuvX_avx2+179>: mov (%rdi),%rdx > 0x0000555555572f16 <ff_yuv2yuvX_avx2+182>: vpermq $0xd8,%ymm3,%ymm3 > 0x0000555555572f1c <ff_yuv2yuvX_avx2+188>: vpermq $0xd8,%ymm6,%ymm6 > => 0x0000555555572f22 <ff_yuv2yuvX_avx2+194>: vmovdqa %ymm3,(%rcx,%rax,1) > 0x0000555555572f27 <ff_yuv2yuvX_avx2+199>: vmovdqa > %ymm6,0x20(%rcx,%rax,1) > 0x0000555555572f2d <ff_yuv2yuvX_avx2+205>: add $0x40,%rax > 0x0000555555572f31 <ff_yuv2yuvX_avx2+209>: mov %rdi,%rsi > 0x0000555555572f34 <ff_yuv2yuvX_avx2+212>: cmp %r8,%rax > 0x0000555555572f37 <ff_yuv2yuvX_avx2+215>: jb 0x555555572eae > <ff_yuv2yuvX_avx2+78> > 0x0000555555572f3d <ff_yuv2yuvX_avx2+221>: vzeroupper > 0x0000555555572f40 <ff_yuv2yuvX_avx2+224>: retq > 0x0000555555572f41 <ff_yuv2yuvX_avx2+225>: nopw %cs:0x0(%rax,%rax,1) > > rax 0x0 0 > rbx 0x30 48 > rcx 0x55555583f470 93824995292272 > rdx 0x55555585e500 93824995419392 > > #0 0x0000555555572f22 in ff_yuv2yuvX_avx2 () > #1 0x00005555555724ee in yuv2yuvX_avx2 () > #2 0x000055555556b4f6 in chr_planar_vscale () > #3 0x0000555555566d41 in swscale () > #4 0x0000555555568284 in sws_scale () > > > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > What does censorship reveal? It reveals fear. -- Julian Assange > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".