Re: [dpdk-dev] [PATCH v3 6/8] stack: add C11 atomic implementation

Eads, Gage Mon, 01 Apr 2019 13:22:05 -0700

> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:[email protected]]
> Sent: Monday, April 1, 2019 2:07 PM
> To: Eads, Gage <[email protected]>; '[email protected]' <[email protected]>
> Cc: '[email protected]' <[email protected]>;
> '[email protected]' <[email protected]>; Richardson,
> Bruce <[email protected]>; Ananyev, Konstantin
> <[email protected]>; Gavin Hu (Arm Technology China)
> <[email protected]>; nd <[email protected]>; [email protected]; nd
> <[email protected]>
> Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation
> 
> > > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation
> > >
> > > [snip]
> > >
> > > > > +static __rte_always_inline void __rte_stack_lf_push(struct
> > > > > +rte_stack_lf_list *list,
> > > > > +                 struct rte_stack_lf_elem *first,
> > > > > +                 struct rte_stack_lf_elem *last,
> > > > > +                 unsigned int num)
> > > > > +{
> > > > > +#ifndef RTE_ARCH_X86_64
> > > > > +     RTE_SET_USED(first);
> > > > > +     RTE_SET_USED(last);
> > > > > +     RTE_SET_USED(list);
> > > > > +     RTE_SET_USED(num);
> > > > > +#else
> > > > > +     struct rte_stack_lf_head old_head;
> > > > > +     int success;
> > > > > +
> > > > > +     old_head = list->head;
> > > > This can be a torn read (same as you have mentioned in
> > > > __rte_stack_lf_pop). I suggest we use acquire thread fence here as
> > > > well (please see the comments in __rte_stack_lf_pop).
> > >
> > > Agreed. I'll add the acquire fence.
> > >
> >
> > On second thought, an acquire fence isn't necessary. The acquire fence
> > in
> > __rte_stack_lf_pop() ensures the list->head is ordered before the list
> > element reads. That isn't necessary here; we need to ensure that the
> > last->next write occurs (and is observed) before the list->head write,
> > which the CAS's RELEASE success memorder accomplishes.
> >
> > If a torn read occurs, the CAS will fail and will atomically re-load 
> > &old_head.
> 
> Following is my understanding:
> The general guideline is there should be a load-acquire for every store-
> release. In both xxx_lf_pop and xxx_lf_push, the head is store-released,
> hence the load of the head should be load-acquire.
> From the code (for ex: in function _xxx_lf_push), you can notice that there is
> dependency from 'old_head to new_head to list->head(in
> compare_exchange)'. When such a dependency exists, if the memory
> orderings have to be avoided, one needs to use __ATOMIC_CONSUME.
> Currently, the compilers will use a stronger memory order (which is
> __ATOMIC_ACQUIRE) as __ATOMIC_CONSUME is not well defined. Please
> refer to [1] and [2] for more info.
> 
> IMO, since, for 128b, we do not have a pure load-acquire, I suggest we use
> thread_fence with acquire semantics. It is a heavier barrier, but I think it 
> is a
> safer code which will adhere to C11 memory model.
> 
> [1] https://preshing.com/20140709/the-purpose-of-
> memory_order_consume-in-cpp11/
> [2] http://www.open-
> std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html

Thanks for those two links, they're good resources.

I agree with your understanding. I admit I'm not fully convinced the 
synchronized-with relationship is needed between pop's list->head store and 
push's list->head load (or between push's list->head store and its list->head 
load), but it's better to err on the side of caution to ensure it's 
functionally correct...at least until I can manage to convince you :).

I'll send out a V6 with the acquire thread fence.

Thanks,
Gage
Re: [dpdk-dev] [PATCH v3 6/8] stack: add C11 atomic implementation

Reply via email to