On Fri, 2017-01-20 at 09:55 -0800, Richard Henderson wrote: > On 01/19/2017 10:23 AM, Torvald Riegel wrote: > > I think I prefer Option 3b as the short-term solution. It does not > > break programs (except the __atomic_always_lock_free assertion scenario, > > but that's likely to not work anyway given that the atomics will be > > lock-free but not "fast"). It makes programs aware that the atomics > > will not be fast when they are not fast indeed (ie, when getting loads > > through cmpxchg). > > I agree. Let's go through the library for the loads, giving us a hook to fix > this in the future.
I'm working on a patch for this. > > I'm worried that Option 4 would not be possible until some time in the > > future when we have actually gotten confirmation from the HW vendors > > about 16-byte atomic loads. The additional risk is that we may never > > get such a confirmation (eg, because they do not want to constrain > > future HW), or that this actually holds just for a few processors. > > Indeed, I don't think we'll get any proper confirmation from the hw vendors > any > time soon. Or possibly ever. > > The only light on the horizon that I can see is that HTM is now working in > newly shipping Intel processors, and we could write a pure load path through > libatomic that uses that. Over time the lack of guaranteed SSE atomicity > becomes less relevant. Unless HW transactions are guaranteed to succeed for scenarios that are sufficient for the atomics, HTM won't help because we'd have to consider the worst-case, which would mean some non-HTM fallback. Intel's current HTM does not make guarantees; IIRC, either Power or s390 have an HTM mode in which there are guarantees, provided that the user follows a few rules.