Hi Richard, >> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance >> once >> the atomic is acquire, release or both. Given there is already a significant >> overhead due >> to the function call, PLT indirection and argument setup, it doesn't make >> sense to add >> extra taken branches that may mispredict or cause extra fetch cycles... > > Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking. > If there isn't any difference for acquire vs. the rest, is there a > justification we can use for keeping the acquire branch, rather than > using SWPAL for everything except relaxed?
The results showed that acquire is typically slightly faster than release (5-10%), so for the most frequently used atomics (CAS and SWP) it makes sense to add support for acquire. In most cases once you have release semantics, adding acquire didn't make things slower, so combining release/acq_rel/seq_cst avoids unnecessary extra branches and keeps the code small. > If so, then Victor, could you include that in the explanation above and > add it as a source comment? Although maybe tone down "doesn't make > sense to add" to something like "doesn't seem worth adding". :) Yes it's worth adding a comment to this effect. Cheers, Wilco