Le 7 février 2025 13:53:22 GMT+02:00, Zhao Zhili <quinkbl...@foxmail.com> a écrit : > > >> On Feb 7, 2025, at 19:46, Zhao Zhili <quinkblack-at-foxmail....@ffmpeg.org> >> wrote: >> >> >> >>> On Feb 7, 2025, at 19:39, Andreas Rheinhardt >>> <andreas.rheinha...@outlook.com> wrote: >>> >>> Andreas Rheinhardt: >>>> Ronald S. Bultje: >>>>> Fixes #11456. >>>>> --- >>>>> libavcodec/threadprogress.c | 3 +-- >>>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>>> >>>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c >>>>> index 62c4fd898b..aa72ff80e7 100644 >>>>> --- a/libavcodec/threadprogress.c >>>>> +++ b/libavcodec/threadprogress.c >>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, int >>>>> n) >>>>> if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= n) >>>>> return; >>>>> >>>>> - atomic_store_explicit(&pro->progress, n, memory_order_release); >>>>> - >>>>> ff_mutex_lock(&pro->progress_mutex); >>>>> + atomic_store_explicit(&pro->progress, n, memory_order_release); >>>>> ff_cond_broadcast(&pro->progress_cond); >>>>> ff_mutex_unlock(&pro->progress_mutex); >>>>> } >>>> >>>> I don't really understand why this is supposed to fix a race; after all, >>>> the synchronisation of ff_thread_progress_(report|await) is not supposed >>>> to be provided by the mutex (which is avoided altogether in the fast >>>> path in ff_thread_report_await()), but by storing and loading the >>>> progress variable. >>>> That's also the reason why I moved this outside of the mutex (compared >>>> to ff_thread_report_progress(). (This way it is possible for a consumer >>>> thread to see the new progress value earlier and possibly avoid the >>>> mutex altogether.) >>>> >>> >>> Damn, this optimization works, but only if the progress variable is >>> always read with acquire-semantics; it is currently read via >>> memory_order_relaxed inside the mutex (just like in >>> ff_thread_await_progress()). >>> >>> According to my understanding, this is what happens: >>> Consumer thread waits for progress and finds that it is insufficient >>> (fast path fails) >>> Producer thread updates progress variable >>> Consumer thread acquires the mutex and reads new progress via >>> memory_order_relaxed >>> Producer thread acquires mutex and broadcasts the new progress >>> >>> I'd prefer to change these semantics so that we always perform >>> synchronisation via the atomic progress variable (unless you know of a >>> performance impact -- I only know that on x86, both memory_order_relaxed >>> and memory_order_acquire are ordinary loads). >> >> I have considered the solution too, by always use memory_order_acquire >> in wait progress. memory_order_relaxed is normal load on ARM, while >> memory_order_acquire isn’t. So there is real difference. >> >> https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDAPR--A64- >> >> Now it’s weird to use memory_order_acquire inside mutex lock. > >cc Remi, who have written VLC atomic_wait and mutex from sketch.
It always gets weird when you mix atomics and CVs. CVs must nominally have some sort of associated state that is modified under the same lock. But in the preexisting code there is no such thing. So yeah, if you need the performance properties (or the infallible initialisation, or the implicit clean-up) of atomics, then you really should use futeces, not CVs. Otherwise don't use atomics. In this particular case, whether this is a false positive of TSan or a real bug depends on the behaviour of other code paths (which I have not had time to review as of yet). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".