Re: -mcx16 vs. not using CAS for atomic loads

Torvald Riegel Tue, 24 Jan 2017 01:10:01 -0800

On Fri, 2017-01-20 at 09:55 -0800, Richard Henderson wrote:
> On 01/19/2017 10:23 AM, Torvald Riegel wrote:
> > I think I prefer Option 3b as the short-term solution.  It does not
> > break programs (except the __atomic_always_lock_free assertion scenario,
> > but that's likely to not work anyway given that the atomics will be
> > lock-free but not "fast").  It makes programs aware that the atomics
> > will not be fast when they are not fast indeed (ie, when getting loads
> > through cmpxchg).
> 
> I agree.  Let's go through the library for the loads, giving us a hook to fix 
> this in the future.


I'm working on a patch for this.

> > I'm worried that Option 4 would not be possible until some time in the
> > future when we have actually gotten confirmation from the HW vendors
> > about 16-byte atomic loads.  The additional risk is that we may never
> > get such a confirmation (eg, because they do not want to constrain
> > future HW), or that this actually holds just for a few processors.
> 
> Indeed, I don't think we'll get any proper confirmation from the hw vendors 
> any 
> time soon.  Or possibly ever.
> 
> The only light on the horizon that I can see is that HTM is now working in 
> newly shipping Intel processors, and we could write a pure load path through 
> libatomic that uses that.  Over time the lack of guaranteed SSE atomicity 
> becomes less relevant.

Unless HW transactions are guaranteed to succeed for scenarios that are
sufficient for the atomics, HTM won't help because we'd have to consider
the worst-case, which would mean some non-HTM fallback.
Intel's current HTM does not make guarantees; IIRC, either Power or s390
have an HTM mode in which there are guarantees, provided that the user
follows a few rules.

Re: -mcx16 vs. not using CAS for atomic loads

Reply via email to