> > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation > > > > [snip] > > > > > > +static __rte_always_inline void > > > > +__rte_stack_lf_push(struct rte_stack_lf_list *list, > > > > + struct rte_stack_lf_elem *first, > > > > + struct rte_stack_lf_elem *last, > > > > + unsigned int num) > > > > +{ > > > > +#ifndef RTE_ARCH_X86_64 > > > > + RTE_SET_USED(first); > > > > + RTE_SET_USED(last); > > > > + RTE_SET_USED(list); > > > > + RTE_SET_USED(num); > > > > +#else > > > > + struct rte_stack_lf_head old_head; > > > > + int success; > > > > + > > > > + old_head = list->head; > > > This can be a torn read (same as you have mentioned in > > > __rte_stack_lf_pop). I suggest we use acquire thread fence here as > > > well (please see the comments in __rte_stack_lf_pop). > > > > Agreed. I'll add the acquire fence. > > > > On second thought, an acquire fence isn't necessary. The acquire fence in > __rte_stack_lf_pop() ensures the list->head is ordered before the list element > reads. That isn't necessary here; we need to ensure that the last->next write > occurs (and is observed) before the list->head write, which the CAS's RELEASE > success memorder accomplishes. > > If a torn read occurs, the CAS will fail and will atomically re-load > &old_head.
Following is my understanding: The general guideline is there should be a load-acquire for every store-release. In both xxx_lf_pop and xxx_lf_push, the head is store-released, hence the load of the head should be load-acquire. >From the code (for ex: in function _xxx_lf_push), you can notice that there is >dependency from 'old_head to new_head to list->head(in compare_exchange)'. >When such a dependency exists, if the memory orderings have to be avoided, one >needs to use __ATOMIC_CONSUME. Currently, the compilers will use a stronger >memory order (which is __ATOMIC_ACQUIRE) as __ATOMIC_CONSUME is not well >defined. Please refer to [1] and [2] for more info. IMO, since, for 128b, we do not have a pure load-acquire, I suggest we use thread_fence with acquire semantics. It is a heavier barrier, but I think it is a safer code which will adhere to C11 memory model. [1] https://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/ [2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html