Hi Gavin,

> > >>> In weak memory models, like arm64, reading the {prod,cons}.tail may get
> > >>> reordered after reading or writing the ring slots, which corrupts the 
> > >>> ring
> > >>> and stale data is observed.
> > >>>
> > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
> > >> most
> > >>> likely caused by missing the acquire semantics when reading cons.tail 
> > >>> (in
> > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to
> > read
> > >> a
> > >>> stale value from the ring slots.
> > >>>
> > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > >> required
> > >>> ordering. This patch is to prevent reading and writing the ring slots 
> > >>> get
> > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > >>
> > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > >> reordered
> > >> before reading the tail. However, to prevent *writing* the ring get
> > >> reordered
> > >> *before reading* the tail you need a full memory barrier, i.e.
> > >> rte_smp_mb().
> > >
> > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while WMB(ST
> > Option) orders ST/ST.
> > > For more details, please refer to: Table B2-1 Encoding of the DMB and DSB
> > <option> parameter  in
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > reference-manual-armv8-for-armv8-a-architecture-profile
> >
> > I see. But you have to change the rte_smp_rmb() function definition in
> > lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
> > other architectures follows same rules.
> > Otherwise, this change is logically wrong, because read barrier in current
> > definition could not be used to order Load with Store.
> >
> 
> Good points, let me re-think how to handle for other architectures.
> Full MB is required for other architectures(x86? Ppc?), but for arm, read 
> barrier(load/store and load/load) is enough.

For x86, I don't think you need any barrier here, as with IA memory mode:
-  Reads are not reordered with other reads.
- Writes are not reordered with older reads.

BTW, could you explain a bit more why barrier is necessary even on arm here?
As I can see, there is a data dependency between the tail value and
subsequent address calculations for ring writes/reads.
Isn't that sufficient to prevent re-ordering even for weak memory model?
Konstantin
 

> 
> > >
> > >>
> > >>>
> > >>> Signed-off-by: gavin hu <gavin...@arm.com>
> > >>> Reviewed-by: Ola Liljedahl <ola.liljed...@arm.com>
> > >>> Tested-by: Nipun Gupta <nipun.gu...@nxp.com>
> > >>> ---
> > >>>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> > >>>  1 file changed, 10 insertions(+), 6 deletions(-)
> > >>>
> > >>> diff --git a/lib/librte_ring/rte_ring_generic.h
> > >> b/lib/librte_ring/rte_ring_generic.h
> > >>> index ea7dbe5..1bd3dfd 100644
> > >>> --- a/lib/librte_ring/rte_ring_generic.h
> > >>> +++ b/lib/librte_ring/rte_ring_generic.h
> > >>> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> > >> unsigned int is_sp,
> > >>>                         return 0;
> > >>>
> > >>>                 *new_head = *old_head + n;
> > >>> -               if (is_sp)
> > >>> -                       r->prod.head = *new_head, success = 1;
> > >>> -               else
> > >>> +               if (is_sp) {
> > >>> +                       r->prod.head = *new_head;
> > >>> +                       rte_smp_rmb();
> > >>> +                       success = 1;
> > >>> +               } else
> > >>>                         success = rte_atomic32_cmpset(&r->prod.head,
> > >>>                                         *old_head, *new_head);
> > >>>         } while (unlikely(success == 0));
> > >>> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring
> > *r,
> > >> unsigned int is_sc,
> > >>>                         return 0;
> > >>>
> > >>>                 *new_head = *old_head + n;
> > >>> -               if (is_sc)
> > >>> -                       r->cons.head = *new_head, success = 1;
> > >>> -               else
> > >>> +               if (is_sc) {
> > >>> +                       r->cons.head = *new_head;
> > >>> +                       rte_smp_rmb();
> > >>> +                       success = 1;
> > >>> +               } else
> > >>>                         success = rte_atomic32_cmpset(&r->cons.head,
> > >> *old_head,
> > >>>                                         *new_head);
> > >>>         } while (unlikely(success == 0));
> > >>>

Reply via email to