This is a patch originally, submitted in 2017 (author/date info left intact). At the time, it didn't get much attention I assume due to the sheer size of it. I have split the patch into only its QPEL/EPEL parts, rebasing, and doing some cleaning of the patches as much is reasonable for a 9001 line diff. I also have SAO band (non-working) and 32x32 IDCT (working but honestly in a worse state than these patches).
This patch gives a large overall speedup roughly 30% in my testing. The only problem is that (as previously stated), 1) it's a lot of code, the original author didn't make use of macros. 2) it's only 8-bit. I will be writing 10-bit assembly, and whilst I do that will clean-up/macro-ify the current 8-bit assembly. Though there is still lots to be done. Our current IDCTs for HEVC aren't great either, I had a 40% speedup on the 16x16 one in testing. The assembly is far from 'done' but we're getting closer slowly at least. There were some suggestions for smaller improvements in the previous reviews and I have not applied those. The first course of action is to refractor it so that it is possible to work on the code without going insane. I think it's fine to use it whilst I'm working on refractoring it due to the large speedup: the code-weight in the binary should be relatively similar even after that anyway. Also, updated kperf patch as per Lynne's request. --. Josh _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".