> On Feb 7, 2025, at 19:39, Andreas Rheinhardt <andreas.rheinha...@outlook.com> 
> wrote:
> 
> Andreas Rheinhardt:
>> Ronald S. Bultje:
>>> Fixes #11456.
>>> ---
>>> libavcodec/threadprogress.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>> 
>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c
>>> index 62c4fd898b..aa72ff80e7 100644
>>> --- a/libavcodec/threadprogress.c
>>> +++ b/libavcodec/threadprogress.c
>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, int n)
>>>     if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= n)
>>>         return;
>>> 
>>> -    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>> -
>>>     ff_mutex_lock(&pro->progress_mutex);
>>> +    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>     ff_cond_broadcast(&pro->progress_cond);
>>>     ff_mutex_unlock(&pro->progress_mutex);
>>> }
>> 
>> I don't really understand why this is supposed to fix a race; after all,
>> the synchronisation of ff_thread_progress_(report|await) is not supposed
>> to be provided by the mutex (which is avoided altogether in the fast
>> path in ff_thread_report_await()), but by storing and loading the
>> progress variable.
>> That's also the reason why I moved this outside of the mutex (compared
>> to ff_thread_report_progress(). (This way it is possible for a consumer
>> thread to see the new progress value earlier and possibly avoid the
>> mutex altogether.)
>> 
> 
> Damn, this optimization works, but only if the progress variable is
> always read with acquire-semantics; it is currently read via
> memory_order_relaxed inside the mutex (just like in
> ff_thread_await_progress()).
> 
> According to my understanding, this is what happens:
> Consumer thread waits for progress and finds that it is insufficient
> (fast path fails)
> Producer thread updates progress variable
> Consumer thread acquires the mutex and reads new progress via
> memory_order_relaxed
> Producer thread acquires mutex and broadcasts the new progress
> 
> I'd prefer to change these semantics so that we always perform
> synchronisation via the atomic progress variable (unless you know of a
> performance impact -- I only know that on x86, both memory_order_relaxed
> and memory_order_acquire are ordinary loads).

I have considered the solution too, by always use memory_order_acquire
in wait progress. memory_order_relaxed is normal load on ARM, while
memory_order_acquire isn’t. So there is real difference.

https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDAPR--A64-

Now it’s weird to use memory_order_acquire inside mutex lock.

> 
> Thanks for looking into this.
> 
> - Andreas
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to