Quoting Michael Niedermayer (2024-07-18 19:40:04) > On Thu, Jul 18, 2024 at 10:20:09AM +0200, Anton Khirnov wrote: > > Quoting Michael Niedermayer (2024-07-18 00:32:38) > > > the data for each decoder task should be together and not scattered around > > > more than needed, reducing cache efficiency > > > > > > putting all this extra code in the inner per pixel loop is not ok > > > especially not for the sake of avoiding a memcpy of a few hundread bytes > > > multiple levels of loops outside > > > > A nice theory, but in practice this patchset makes single-threaded > > decoding about 4% faster overall, on a 1920x1080 10bit sample. That's > > just the ffv1 parts (up to patch 28), full set also improves frame > > threading performance as follows: > > threads improvement > > --------------------------- > > 2 52% (yes really) > > 4 16% > > 8 12% > > I do want the speed improvements, yes. > > But > you compare frame threading when slice threading performed > much better than frame threading prior to the patch
If that were true in general, there'd be no reason for frame threading support in ffv1, as it has a higher latency and uses more memory; higher performance is its only advantage. However you added frame threading in a0c0900e470fde0d6db360e555620476c2323895 claiming it is faster, which I can partially confirm even with current master - slice threading saturates at thread count = slice count, while frame threading scales beyond it. Frame threading also improves significantly after this set: threads | slice | frame/before | frame/after ----------------------------------------------- 2 22.6124 43.738 22.0354 4 14.3367 15.115 13.1964 6 14.3850 11.974 10.9745 8 14.3472 9.7229 8.76617 10 14.3579 8.4638 8.6499 12 14.3665 8.4636 8.5735 16 14.2960 7.6926 7.1696 ----------------------------------------------- (values are total decode time in seconds) Note that after this set frame threading is ALWAYS faster than slice threading, for any thread count. > also id like to see the individual changes which look like they should > make teh code slower, to be tested individually. If they make the code slower > they should be dropped I don't think it's meaningful to individually benchmark the patches moving per-slice data into the new per-slice context. I split them to simplify testing and review, but it only makes sense to apply all of them or none, otherwise the code gets more complex. -- Anton Khirnov _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".