Hi Thomas,
On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote:
15/05/2023 04:10, Zhang, Qi Z:
From: Ruifeng Wang <ruifeng.w...@arm.com>
From: Min Zhou <zhou...@loongson.cn>
Segmentation fault has been observed while running the
ixgbe_recv_pkts_lro() function to receive packets on the Loongson
3C5000 processor which has 64 cores and 4 NUMA nodes.
From the ixgbe_recv_pkts_lro() function, we found that as long as the
first packet has the EOP bit set, and the length of this packet is
less than or equal to rxq->crc_len, the segmentation fault will
definitely happen even though on the other platforms. For example, if
we made the first packet which had the EOP bit set had a zero length by
force, the segmentation fault would happen on X86.
Because when processd the first packet the first_seg->next will be
NULL, if at the same time this packet has the EOP bit set and its
length is less than or equal to rxq->crc_len, the following loop will be
executed:
for (lp = first_seg; lp->next != rxm; lp = lp->next)
;
We know that the first_seg->next will be NULL under this condition. So
the expression of
lp->next->next will cause the segmentation fault.
Normally, the length of the first packet with EOP bit set will be
greater than rxq-
crc_len. However, the out-of-order execution of CPU may make the read
ordering of the
status and the rest of the descriptor fields in this function not be
correct. The related codes are as following:
rxdp = &rx_ring[rx_id];
#1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
if (!(staterr & IXGBE_RXDADV_STAT_DD))
break;
#2 rxd = *rxdp;
The sentence #2 may be executed before sentence #1. This action is
likely to make the ready packet zero length. If the packet is the
first packet and has the EOP bit set, the above segmentation fault will
happen.
So, we should add a proper memory barrier to ensure the read ordering
be correct. We also did the same thing in the ixgbe_recv_pkts()
function to make the rxd data be valid even though we did not find
segmentation fault in this function.
Fixes: 8eecb3295ae ("ixgbe: add LRO support")
Cc: sta...@dpdk.org
Signed-off-by: Min Zhou <zhou...@loongson.cn>
---
v3:
- Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
---
v2:
- Make the calling of rte_rmb() for all platforms
---
[...]
Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
Applied to dpdk-next-net-intel.
Thanks
Qi
Why ignoring checkpatch?
It is saying:
"
Warning in drivers/net/ixgbe/ixgbe_rxtx.c:
Using rte_smp_[r/w]mb
"
I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code?
Ruifeng proposed "rte_atomic_thread_fence(__ATOMIC_ACQUIRE)"
in a comment on the v2.
Thanks, I see. I think I also can use rte_atomic_thread_fence() to solve
this problem. I will send the V4 patch.
I will drop this patch from the pull of next-net-intel branch.