Michael Niedermayer <mich...@niedermayer.cc> 于2023年3月8日周三 04:45写道: > > On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote: > > From: Junxian Zhu <zhujunx...@oss.cipunited.com> > > > > Rewrite mid_pred function in generic mathops.h, reduce branch jump to > > improve performance. And because nowadays new version compiler can compile > > enough short asmbbely code as handwritting in these function, so remove > > specified optimized mips inline asmbbely mathops.h. > > as you write, that it improves performance > what speed effect does this have exactly? > thx >
I tested the performance, using this code ``` #include <stdio.h> #include <time.h> #include <stdlib.h> #define FFMIN(a, b) ( a>b ? b : a ) #define FFMAX(a, b) ( a>b ? a : b ) int mid_pred(int a, int b, int c) { #if OLD if(a>b){ if(c>b){ if(c>a) b=a; else b=c; } }else{ if(b>c){ if(c>a) b=c; else b=a; } } return b; #else int t0,t1,t2,t3; t0 = (a > b) ? b : a ; t1 = (a > b) ? a : b ; t2 = (t0 > c) ? t0 : c; t3 = (t1 > t2) ? t2 : t1; return t3; #endif } int main() { int a[1024], b[1024], c[1024], d[1024]; srand(time(NULL)); for(int i=0; i<1024; i++) { a[i] = rand(); b[i] = rand(); c[i] = rand(); } for (int j=0; j<1e7+rand()%2; j++) for(int i=0; i<1024; i++) d[i] = mid_pred(a[i], b[i], c[i]); printf("%d, %d\n", d[rand()%1024], j); } ``` On MacOS 13.2 with Apple M1: The old code the new code 2.1s 2.3s On Cavium ThunderX / arm64 (GCC 10.2.1 -O3) The old code the new code 52.7s 37.8s On Loongson 3A4000/mips64el (GCC 10.2.1 -O3) The old code the new code 90s 5s On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 10.2.1 -O3) The old code the new code 14.4s 15.4s On SF19A2890/MIPS interAptiv (GCC 10.2.1 -O3) The old code the new code 314s 39.3s On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 12.2.0 -O3) The old code the new code 14.4s 8.8s On sifive,bullet0/rv64imafdc (GCC 12.2.0 -O3, 1e6 times instead of 1e7) The old code the new code 11.9s 15.2s On Freescale i.MX53/ARMv7 Processor rev 5 (v7l) (GCC 12.2.0 -O3, 1e6 times instead of 1e7) The old code the new code 24.1s 15.7s On POWER8 (architected), altivec supported, BIG ENDIAN, ppc64 (GCC 12.2.0 -O3) The old code the new code 43.1s 50.8s On POWER8 (architected), altivec supported, LITTLE ENDIAN, ppc64el (GCC 12.2.0 -O3) The old code the new code 7.8s 4.7s On PA8900 (Shortfin) PA-RISC (GCC 12.2.0 -O3 1e6 times instead of 1e7) The old code the new code 39.9s 47.2s On IBM/S390 aka s390x (GCC 12.2.0 -O3) The old code the new code 82.2s 30.8s On Intel(R) Itanium(R) Processor 9320 (GCC 12.2.0 -O3) The old code the new code 89.5s 78.1s Cavium Octeon III V0.2 FPU V0.0 /mipsel (GCC 12.2.0 -O3) The old code the new code 117.5s 118.5s > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > It is dangerous to be right in matters on which the established authorities > are wrong. -- Voltaire > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".