> On Feb 7, 2025, at 19:39, Andreas Rheinhardt <andreas.rheinha...@outlook.com> > wrote: > > Andreas Rheinhardt: >> Ronald S. Bultje: >>> Fixes #11456. >>> --- >>> libavcodec/threadprogress.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c >>> index 62c4fd898b..aa72ff80e7 100644 >>> --- a/libavcodec/threadprogress.c >>> +++ b/libavcodec/threadprogress.c >>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, int n) >>> if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= n) >>> return; >>> >>> - atomic_store_explicit(&pro->progress, n, memory_order_release); >>> - >>> ff_mutex_lock(&pro->progress_mutex); >>> + atomic_store_explicit(&pro->progress, n, memory_order_release); >>> ff_cond_broadcast(&pro->progress_cond); >>> ff_mutex_unlock(&pro->progress_mutex); >>> } >> >> I don't really understand why this is supposed to fix a race; after all, >> the synchronisation of ff_thread_progress_(report|await) is not supposed >> to be provided by the mutex (which is avoided altogether in the fast >> path in ff_thread_report_await()), but by storing and loading the >> progress variable. >> That's also the reason why I moved this outside of the mutex (compared >> to ff_thread_report_progress(). (This way it is possible for a consumer >> thread to see the new progress value earlier and possibly avoid the >> mutex altogether.) >> > > Damn, this optimization works, but only if the progress variable is > always read with acquire-semantics; it is currently read via > memory_order_relaxed inside the mutex (just like in > ff_thread_await_progress()). > > According to my understanding, this is what happens: > Consumer thread waits for progress and finds that it is insufficient > (fast path fails) > Producer thread updates progress variable > Consumer thread acquires the mutex and reads new progress via > memory_order_relaxed > Producer thread acquires mutex and broadcasts the new progress > > I'd prefer to change these semantics so that we always perform > synchronisation via the atomic progress variable (unless you know of a > performance impact -- I only know that on x86, both memory_order_relaxed > and memory_order_acquire are ordinary loads).
I have considered the solution too, by always use memory_order_acquire in wait progress. memory_order_relaxed is normal load on ARM, while memory_order_acquire isn’t. So there is real difference. https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDAPR--A64- Now it’s weird to use memory_order_acquire inside mutex lock. > > Thanks for looking into this. > > - Andreas > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".