load order in enqueue and dequeue

Ananyev, Konstantin Fri, 10 Nov 2017 01:59:31 -0800


> -----Original Message-----
> From: Jia He [mailto:hejia...@gmail.com]
> Sent: Friday, November 10, 2017 1:51 AM
> To: jerin.ja...@caviumnetworks.com; dev@dpdk.org; olivier.m...@6wind.com
> Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; Richardson, Bruce 
> <bruce.richard...@intel.com>; jianbo....@arm.com;
> hemant.agra...@nxp.com; Jia He <hejia...@gmail.com>; Jia He 
> <jia...@hxt-semitech.com>; jie2....@hxt-semitech.com; bing.zhao@hxt-
> semitech.com
> Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue
> 
> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
> In __rte_ring_move_cons_head()
> ...
>         do {
>                 /* Restore n as it may change every loop */
>                 n = max;
> 
>                 *old_head = r->cons.head;                //1st load
>                 const uint32_t prod_tail = r->prod.tail; //2nd load
> 
> cpu1(producer)          cpu2(consumer)          cpu3(consumer)
>                         load r->prod.tail
> in enqueue:
> load r->cons.tail
> load r->prod.head
> 
> store r->prod.tail
> 
>                                                 load r->cons.head
>                                                 load r->prod.tail
>                                                 ...
>                                                 store r->cons.{head,tail}
>                         load r->cons.head
> 
> In weak memory order architectures(powerpc,arm), the 2nd load might be
> reodered before the 1st load, that makes *entries is bigger than we
> wanted. This nasty reording messed enque/deque up. Then, r->cons.head
> will be bigger than prod_tail, then make *entries very big and the
> consumer will go forward incorrectly.
> 
> After this patch, even with above context switches, the old cons.head
> will be recaculated after failure of rte_atomic32_cmpset. So no race
> conditions left.
> 
> There is no such issue on X86, because X86 is strong memory order model.
> But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
> keep the same code without architectures specific concerns.
> 
> Signed-off-by: Jia He <jia...@hxt-semitech.com>
> Signed-off-by: jie2....@hxt-semitech.com
> Signed-off-by: bing.z...@hxt-semitech.com
> ---
>  lib/librte_ring/rte_ring.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 5e9b3b7..3e8085a 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -409,6 +409,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
>               n = max;
> 
>               *old_head = r->prod.head;
> +
> +             /* add rmb barrier to avoid load/load reorder in weak
> +              * memory model. It is noop on x86 */
> +             rte_smp_rmb();
> +
>               const uint32_t cons_tail = r->cons.tail;
>               /*
>                *  The subtraction is done between two unsigned 32bits value
> @@ -517,6 +522,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
>               n = max;
> 
>               *old_head = r->cons.head;
> +
> +             /* add rmb barrier to avoid load/load reorder in weak
> +              * memory model. It is noop on x86 */
> +             rte_smp_rmb();
> +
>               const uint32_t prod_tail = r->prod.tail;
>               /* The subtraction is done between two unsigned 32bits value
>                * (the result is always modulo 32 bits even if we have
> --


Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com>

> 2.7.4

Re: [dpdk-dev] [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue

Reply via email to