Quoting Michael Niedermayer (2024-07-18 00:32:38) > the data for each decoder task should be together and not scattered around > more than needed, reducing cache efficiency > > putting all this extra code in the inner per pixel loop is not ok > especially not for the sake of avoiding a memcpy of a few hundread bytes > multiple levels of loops outside
A nice theory, but in practice this patchset makes single-threaded decoding about 4% faster overall, on a 1920x1080 10bit sample. That's just the ffv1 parts (up to patch 28), full set also improves frame threading performance as follows: threads improvement --------------------------- 2 52% (yes really) 4 16% 8 12% -- Anton Khirnov _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".