On Fri, Jun 06, 2025 at 10:49:03PM +0300, Mikhail Kshevetskiy wrote: > ARCH_DMA_MINALIGN is 64 for ARMv7a/ARMv8a architectures, but RX/TX > descriptors are 32 bytes long. So they may not be aligned on an > ARCH_DMA_MINALIGN boundary. In case of RX path, this may cause the > following problem > > 1) Assume that a packet has arrived and the EVEN rx descriptor has been > updated with the incoming data. The driver will invalidate and check > the corresponding rx descriptor. > > 2) Now suppose the next descriptor (ODD) has not yet completed. > > Please note that all even descriptors starts on 64-byte boundary, > and the odd ones are NOT aligned on 64-byte boundary. > > Inspecting even descriptor, we will read the entire CPU cache line > (64 bytes). So we read and sore in СЗГ cache also the next (odd) > descriptor. > > 3) Now suppose the next packet (for the odd rx descriptor) arrived > while the first packet was being processed. So we have new data > in memory but old data in cache. > > 4) After packet processing (in arht_eth_free_pkt() function) we will > cleanup the descriptor and put it back to rx queue. > > This will call flush_dcache_range() function for the even descriptor, > so the odd one will be flushed as well (it is in the same cache line). > So the old data will be written to the next rx descriptor. > > 5) We get a freeze. The next descriptor is empty (so the driver is > waiting for packets), but the hardware will continue to receive > packets on other available descriptors. This will continue until > the last available rx descriptor is full. Then the hardware will > also freeze. > > The problem will be solved if the previous descriptor will be put back > to the queue instead of the current one. > > If the current descriptor is even (starts on a 64-byte boundary), > then putting the previous descriptor to the rx queue will affect > the previous cache line. To be 100% ok, we must make sure that the > previous and the one before the previous descriptor cannot be used > for receiving at this moment. > > If the current descriptor is odd, then the previous descriptor is on > the same cache line. Both (current and previous) descriptors are not > currently in use, so issue will not arrise. > > WARNING: The following restrictions on PKTBUFSRX must be held: > * PKTBUFSRX is even, > * PKTBUFSRX >= 4. > > The bug appears on 32-bit airoha platform, but should be present on > 64-bit as well. >
Thanks a lot for bisecting this! I had indded some report of the driver getting stuck sometime but I had them sorted by making the CPU IDX handling more dynamic (by reading the register every time, maybe this was I was indirectly invalidating cache???) Anyway will test if this works correctly on an7581 and then I will add my ack tag. > Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevets...@iopsys.eu> > --- > drivers/net/airoha_eth.c | 35 ++++++++++++++++++++++++++--------- > 1 file changed, 26 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/airoha_eth.c b/drivers/net/airoha_eth.c > index e8565be9b3c..d168dda1b62 100644 > --- a/drivers/net/airoha_eth.c > +++ b/drivers/net/airoha_eth.c > @@ -392,13 +392,14 @@ static int airoha_fe_init(struct airoha_eth *eth) > return 0; > } > > -static void airoha_qdma_reset_rx_desc(struct airoha_queue *q, int index, > - uchar *rx_packet) > +static void airoha_qdma_reset_rx_desc(struct airoha_queue *q, int index) > { > struct airoha_qdma_desc *desc; > + uchar *rx_packet; > u32 val; > > desc = &q->desc[index]; > + rx_packet = net_rx_packets[index]; > index = (index + 1) % q->ndesc; > > dma_map_single(rx_packet, PKTSIZE_ALIGN, DMA_TO_DEVICE); > @@ -420,7 +421,7 @@ static void airoha_qdma_init_rx_desc(struct airoha_queue > *q) > int i; > > for (i = 0; i < q->ndesc; i++) > - airoha_qdma_reset_rx_desc(q, i, net_rx_packets[i]); > + airoha_qdma_reset_rx_desc(q, i); > } > > static int airoha_qdma_init_rx_queue(struct airoha_queue *q, > @@ -444,10 +445,14 @@ static int airoha_qdma_init_rx_queue(struct > airoha_queue *q, > RX_RING_SIZE_MASK, > FIELD_PREP(RX_RING_SIZE_MASK, ndesc)); > > + /* > + * See arht_eth_free_pkt() for the reasons used to fill > + * REG_RX_CPU_IDX(qid) register. > + */ > airoha_qdma_rmw(qdma, REG_RX_RING_SIZE(qid), RX_RING_THR_MASK, > FIELD_PREP(RX_RING_THR_MASK, 0)); > airoha_qdma_rmw(qdma, REG_RX_CPU_IDX(qid), RX_RING_CPU_IDX_MASK, > - FIELD_PREP(RX_RING_CPU_IDX_MASK, q->ndesc - 1)); > + FIELD_PREP(RX_RING_CPU_IDX_MASK, q->ndesc - 3)); > airoha_qdma_rmw(qdma, REG_RX_DMA_IDX(qid), RX_RING_DMA_IDX_MASK, > FIELD_PREP(RX_RING_DMA_IDX_MASK, q->head)); > > @@ -906,6 +911,7 @@ static int arht_eth_free_pkt(struct udevice *dev, uchar > *packet, int length) > struct airoha_qdma *qdma = ð->qdma[0]; > struct airoha_queue *q; > int qid; > + u16 prev, pprev; > > if (!packet) > return 0; > @@ -913,13 +919,24 @@ static int arht_eth_free_pkt(struct udevice *dev, uchar > *packet, int length) > qid = 0; > q = &qdma->q_rx[qid]; > > - dma_map_single(packet, length, DMA_TO_DEVICE); > - > - airoha_qdma_reset_rx_desc(q, q->head, packet); > + /* > + * Due to cpu cache issue the airoha_qdma_reset_rx_desc() function > + * will always touch 2 descriptors: > + * - if current descriptor is even, then the previous and the one > + * before previous descriptors will be touched (previous cacheline) > + * - if current descriptor is odd, then only current and previous > + * descriptors will be touched (current cacheline) > + * > + * Thus, to prevent possible destroying of rx queue, only (q->ndesc - 2) > + * descriptors might be used for packet receiving. > + */ > + prev = (q->head + q->ndesc - 1) % q->ndesc; > + pprev = (q->head + q->ndesc - 2) % q->ndesc; > + q->head = (q->head + 1) % q->ndesc; > > + airoha_qdma_reset_rx_desc(q, prev); > airoha_qdma_rmw(qdma, REG_RX_CPU_IDX(qid), RX_RING_CPU_IDX_MASK, > - FIELD_PREP(RX_RING_CPU_IDX_MASK, q->head)); > - q->head = (q->head + 1) % q->ndesc; > + FIELD_PREP(RX_RING_CPU_IDX_MASK, pprev)); > > return 0; > } > -- > 2.47.2 > -- Ansuel