On Thu, Jul 18, 2024 at 10:20 AM Anton Khirnov <an...@khirnov.net> wrote:
> Quoting Michael Niedermayer (2024-07-18 00:32:38) > > the data for each decoder task should be together and not scattered > around > > more than needed, reducing cache efficiency > > > > putting all this extra code in the inner per pixel loop is not ok > > especially not for the sake of avoiding a memcpy of a few hundread bytes > multiple levels of loops outside > > A nice theory, but in practice this patchset makes single-threaded > decoding about 4% faster overall, on a 1920x1080 10bit sample. That's > just the ffv1 parts (up to patch 28), full set also improves frame > threading performance as follows: > threads improvement > --------------------------- > 2 52% (yes really) > What? Sloppy programming skills.... > 4 16% > 8 12% > > -- > Anton Khirnov > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".