Hi Martin, >> it's my first attempt to do some assembly, it might still includes some >> dont's of the asm world... >> Tested with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 >> >> Speed-wise, it sees a drop for small prediction orders until around 10 or 11. >> Well, the maximum prediction order is 1023. >> I therefore checked with the "real-world" samples from the fate-suite, which >> suggests low prediction orders are non-dominant: >> >> pred_order = {7..17}, gain: 23% >> >> als_reconstruct_all_c: 26645.2 >> als_reconstruct_all_neon: 20635.2 > > This is the combination that the patch actually tests by default, if I read > the code correctly - right?
exactly. > You didn't write what CPU you tested this on - do note that the actual > peformance of the assembly is pretty heavily dependent on the CPU. > > I get roughly similar numbers if I build with GCC: > > Cortex A53 A72 A73 > als_reconstruct_all_c: 107708.2 44044.5 57427.7 > als_reconstruct_all_neon: 78895.7 38464.7 34065.5 Was a remote one, don't know exactly, yet. Will find out for v2. > However - if I build with Clang, where vectorization isn't disabled by > configure, the C code beats the handwritten assembly: > > Cortex A53 > als_reconstruct_all_c: 69145.7 > als_reconstruct_all_neon: 78895.7 > > Even if I only test order 17, the C code still is faster. So clearly we can > do better - if nothing else, we could copy the assembly code that Clang > outputs :-) Narf. Well maybe thoughts about the code itself will get more speed manually... > First a couple technical details about the patch... > [...] I very much appreciate your excessive feedback, I will need quite some time to work through it! :) Thanks! -Thilo _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".