On Sun, Aug 16, 2020 at 18:25:12 +0200, Paul B Mahol wrote: > On 8/16/20, Paul B Mahol <one...@gmail.com> wrote: > > Please help porting this to linux and 64bit calling convention. > > New patch attached. > > This one does not allocate stack on x32.
I wanted to benchmark on several machines (newest I have is a Haswell, I also have an "Intel(R) Atom(TM) CPU D525 @ 1.80GHz" x86_64, and the below is a Pentium 4 x86), but got stuck on the ancient x86. Firstly, superficial benchmark result on the Pentium 4: $ time ffmpeg -i bigger_res.mov -map 0:v -f null - Without patchset: speed=0.0331x (plus/minus a bit) With patchset: speed=0.0577x (plus/minus a bit) I'll add benchmarks with my other systems, if desired. Alas, with the patchset, the following command quickly terminates with Illegal instruction in ff_cfhd_horiz_filter_clip10_sse2 (): $ ffmpeg -i MT_BeartoothHighway_1min_Cineform.avi -map 0:v -f null - (and obviously doesn't terminate with "-cpuflags 0", or without the patchset). See assembler dump below. Compilier: icc (ICC) 14.0.3 20140422 Assembler: nasm-2.13.02 Assembly dump from gdb: Dump of assembler code from 0x919572f to 0x919576f: 0x0919572f <ff_cfhd_horiz_filter_clip10_sse2+47>: movl $0xbf0f03ff,(%ecx,%eax,8) 0x09195736 <ff_cfhd_horiz_filter_clip10_sse2+54>: xor (%ecx),%al 0x09195738 <ff_cfhd_horiz_filter_clip10_sse2+56>: not %ecx 0x0919573a <ff_cfhd_horiz_filter_clip10_sse2+58>: jmp *0xf(%esi) 0x0919573d <ff_cfhd_horiz_filter_clip10_sse2+61>: outsb %ds:(%esi),(%dx) 0x0919573e <ff_cfhd_horiz_filter_clip10_sse2+62>: (bad) 0x0919573f <ff_cfhd_horiz_filter_clip10_sse2+63>: pmaxsw 0x99e6da0,%xmm0 0x09195747 <ff_cfhd_horiz_filter_clip10_sse2+71>: pminsw 0x99e6db0,%xmm0 => 0x0919574f <ff_cfhd_horiz_filter_clip10_sse2+79>: pextrw $0x0,%xmm0,(%eax) 0x09195755 <ff_cfhd_horiz_filter_clip10_sse2+85>: movswl (%ecx),%esi 0x09195758 <ff_cfhd_horiz_filter_clip10_sse2+88>: imul $0x5,%esi,%esi 0x0919575b <ff_cfhd_horiz_filter_clip10_sse2+91>: movswl 0x2(%ecx),%edi 0x0919575f <ff_cfhd_horiz_filter_clip10_sse2+95>: imul $0x4,%edi,%edi 0x09195762 <ff_cfhd_horiz_filter_clip10_sse2+98>: add %esi,%edi 0x09195764 <ff_cfhd_horiz_filter_clip10_sse2+100>: movswl 0x4(%ecx),%esi 0x09195768 <ff_cfhd_horiz_filter_clip10_sse2+104>: sub %esi,%edi 0x0919576a <ff_cfhd_horiz_filter_clip10_sse2+106>: add $0x4,%edi 0x0919576d <ff_cfhd_horiz_filter_clip10_sse2+109>: sar $0x3,%edi End of assembler dump. CPU info: barsnick@sunshine:~ > hwinfo --cpu 01: None 00.0: 10103 CPU [Created at cpu.457] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr Clock: 2800 MHz BogoMips: 5597.27 Cache: 512 kb Units/Processor: 2 Config Status: cfg=new, avail=yes, need=no, active=unknown 02: None 01.0: 10103 CPU [Created at cpu.457] Unique ID: wkFv.j8NaKXDZtZ6 Hardware Class: cpu Arch: Intel Vendor: "GenuineIntel" Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz" Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr Clock: 2800 MHz BogoMips: 27198.67 Cache: 512 kb Units/Processor: 2 Config Status: cfg=new, avail=yes, need=no, active=unknown Cheers, Moritz _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".