Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Hi,
>
>>> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
>>> others.
>>
>> We started off implementing all possible memory orderings available.
>> Wilco saw value in merging less restricted orderings into more
>> restricted ones - mainly to reduce codesize in less frequently used atomics.
>>
>> This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions
>> a little smaller.
>
> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance 
> once
> the atomic is acquire, release or both. Given there is already a significant 
> overhead due
> to the function call, PLT indirection and argument setup, it doesn't make 
> sense to add
> extra taken branches that may mispredict or cause extra fetch cycles...

Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking.
If there isn't any difference for acquire vs. the rest, is there a
justification we can use for keeping the acquire branch, rather than
using SWPAL for everything except relaxed?

If so, then Victor, could you include that in the explanation above and
add it as a source comment?  Although maybe tone down "doesn't make
sense to add" to something like "doesn't seem worth adding". :)

Richard

Reply via email to