On Thu, May 11, 2023 at 6:20 PM Marton Balint <c...@passwd.hu> wrote: > Actually the cached bitstream reader was faster here than the manual > approach: > > ./ffmpeg -stream_loop 128 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s > 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none > -loglevel error > > Old code: > > 821050920 decicycles in bitpacked, 1 runs, 0 skips > 815402160 decicycles in bitpacked, 2 runs, 0 skips > 814108410 decicycles in bitpacked, 4 runs, 0 skips > 814213800 decicycles in bitpacked, 8 runs, 0 skips > 815048325 decicycles in bitpacked, 16 runs, 0 skips > 812866713 decicycles in bitpacked, 32 runs, 0 skips > 809186523 decicycles in bitpacked, 64 runs, 0 skips > 808317601 decicycles in bitpacked, 128 runs, 0 skips > > With the patch: > > 379879920 decicycles in bitpacked, 1 runs, 0 skips > 387491580 decicycles in bitpacked, 2 runs, 0 skips > 397720260 decicycles in bitpacked, 4 runs, 0 skips > 389581560 decicycles in bitpacked, 8 runs, 0 skips > 381820635 decicycles in bitpacked, 16 runs, 0 skips > 379791675 decicycles in bitpacked, 32 runs, 0 skips > 379246303 decicycles in bitpacked, 64 runs, 0 skips > 379221671 decicycles in bitpacked, 128 runs, 0 skips > > Old code and #defined CACHED_BITSTREAM_READER 1 > > 345122280 decicycles in bitpacked, 1 runs, 0 skips > 343663020 decicycles in bitpacked, 2 runs, 0 skips > 343372680 decicycles in bitpacked, 4 runs, 0 skips > 342554535 decicycles in bitpacked, 8 runs, 0 skips > 340816522 decicycles in bitpacked, 16 runs, 0 skips > 340225672 decicycles in bitpacked, 32 runs, 0 skips > 340283520 decicycles in bitpacked, 64 runs, 0 skips > 339643105 decicycles in bitpacked, 128 runs, 0 skips
I don't have a good explanation for this. I could speculate that some of it comes down to the processor architecture, how much onboard cache it has, gcc version (and what sort of optimization/vectorization it does, if any), etc. In my case I was testing on Haswell and Skylake (both with 12MB cache) with gcc 4.8. I would welcome feedback from others. Looking at the code to libavcodec/git_bits.h, it might also be worth looking at setting #define LONG_BITSTREAM_READER, as that might speed things up as well for such large files. Devin -- Devin Heitmueller, Senior Software Engineer LTN Global Communications o: +1 (301) 363-1001 w: https://ltnglobal.com e: devin.heitmuel...@ltnglobal.com _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".