> -----Original Message----- > From: Min Zhou <zhou...@loongson.cn> > Sent: Tuesday, June 13, 2023 5:44 PM > To: tho...@monjalon.net; qi.z.zh...@intel.com; m...@smartsharesystems.com; > konstantin.v.anan...@yandex.ru; Ruifeng Wang <ruifeng.w...@arm.com>; > d...@linux.vnet.ibm.com; roret...@linux.microsoft.com; qiming.y...@intel.com; > wenjun1...@intel.com; zhou...@loongson.cn > Cc: dev@dpdk.org; sta...@dpdk.org; maob...@loongson.cn; > jiawe...@trustnetic.com > Subject: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx > functions > > Segmentation fault has been observed while running the > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 > processor which > has 64 cores and 4 NUMA nodes. > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first > packet has the > EOP bit set, and the length of this packet is less than or equal to > rxq->crc_len, the > segmentation fault will definitely happen even though on the other platforms. > For example, > if we made the first packet which had the EOP bit set had a zero length by > force, the > segmentation fault would happen on X86. > > Because when processd the first packet the first_seg->next will be NULL, if > at the same > time this packet has the EOP bit set and its length is less than or equal to > rxq->crc_len, > the following loop will be executed: > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > ; > > We know that the first_seg->next will be NULL under this condition. So the > expression of > lp->next->next will cause the segmentation fault. > > Normally, the length of the first packet with EOP bit set will be greater > than rxq- > >crc_len. However, the out-of-order execution of CPU may make the read > >ordering of the > status and the rest of the descriptor fields in this function not be correct. > The related > codes are as following: > > rxdp = &rx_ring[rx_id]; > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > #2 rxd = *rxdp; > > The sentence #2 may be executed before sentence #1. This action is likely to > make the > ready packet zero length. If the packet is the first packet and has the EOP > bit set, the > above segmentation fault will happen. > > So, we should add a proper memory barrier to ensure the read ordering be > correct. We also > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be > valid even > though we did not find segmentation fault in this function. > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > Cc: sta...@dpdk.org > > Signed-off-by: Min Zhou <zhou...@loongson.cn> > --- > v4: > - Replace rte_smp_rmb() with rte_atomic_thread_fence() as the proper memory > barrier > --- > v3: > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > --- > v2: > - Make the calling of rte_rmb() for all platforms > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 47 +++++++++++++++------------------- > 1 file changed, 21 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index > 6cbb992823..61f17cd90b 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > **rx_pkts, > * of accesses cannot be reordered by the compiler. If they were > * not volatile, they could be reordered which could lead to > * using invalid descriptor fields when read from rxd. > + * > + * Meanwhile, to prevent the CPU from executing out of order, we > + * need to use a proper memory barrier to ensure the memory > + * ordering below. > */ > rxdp = &rx_ring[rx_id]; > staterr = rxdp->wb.upper.status_error; > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > + > + /* > + * Use acquire fence to ensure that status_error which includes > + * DD bit is loaded before loading of other descriptor words. > + */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > rxd = *rxdp; > > /* > @@ -2088,32 +2099,10 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf > **rx_pkts, > uint16_t nb_pkts, > > next_desc: > /* > - * The code in this whole file uses the volatile pointer to > - * ensure the read ordering of the status and the rest of the > - * descriptor fields (on the compiler level only!!!). This is so > - * UGLY - why not to just use the compiler barrier instead? DPDK > - * even has the rte_compiler_barrier() for that. > - * > - * But most importantly this is just wrong because this doesn't > - * ensure memory ordering in a general case at all. For > - * instance, DPDK is supposed to work on Power CPUs where > - * compiler barrier may just not be enough! > - * > - * I tried to write only this function properly to have a > - * starting point (as a part of an LRO/RSC series) but the > - * compiler cursed at me when I tried to cast away the > - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > - * keeping it the way it is for now. > - * > - * The code in this file is broken in so many other places and > - * will just not work on a big endian CPU anyway therefore the > - * lines below will have to be revisited together with the rest > - * of the ixgbe PMD. > - * > - * TODO: > - * - Get rid of "volatile" and let the compiler do its job. > - * - Use the proper memory barrier (rte_rmb()) to ensure the > - * memory ordering below. > + * "Volatile" only prevents caching of the variable marked > + * volatile. Most important, "volatile" cannot prevent the CPU > + * from executing out of order. So, it is necessary to use a > + * proper memory barrier to ensure the memory ordering below. > */ > rxdp = &rx_ring[rx_id]; > staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > @@ -2121,6 +2110,12 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf > **rx_pkts, > uint16_t nb_pkts, > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > + /* > + * Use acquire fence to ensure that status_error which includes > + * DD bit is loaded before loading of other descriptor words. > + */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > rxd = *rxdp; > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > -- > 2.31.1
Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>