On 5.12.2018 14:37, claudiu.bez...@microchip.com wrote: > > On 30.11.2018 20:21, Anssi Hannula wrote: >> When reading buffer descriptors on RX or on TX completion, an >> RX_USED/TX_USED bit is checked first to ensure that the descriptor has >> been populated. However, there are no memory barriers to ensure that the >> data protected by the RX_USED/TX_USED bit is up-to-date with respect to >> that bit. >> >> Fix that by adding DMA read memory barriers on those paths. >> >> I did not observe any actual issues caused by these being missing, >> though. >> >> Tested on a ZynqMP based system. >> >> Signed-off-by: Anssi Hannula <anssi.hann...@bitwise.fi> >> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") >> Cc: Nicolas Ferre <nicolas.fe...@microchip.com> >> --- >> drivers/net/ethernet/cadence/macb_main.c | 20 ++++++++++++++++---- >> 1 file changed, 16 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/cadence/macb_main.c >> b/drivers/net/ethernet/cadence/macb_main.c >> index 430b7a0f5436..c93baa8621d5 100644 >> --- a/drivers/net/ethernet/cadence/macb_main.c >> +++ b/drivers/net/ethernet/cadence/macb_main.c >> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue) >> >> /* First, update TX stats if needed */ >> if (skb) { >> + /* Ensure all of desc is at least as up-to-date >> + * as ctrl (TX_USED bit) >> + */ >> + dma_rmb(); >> + > Is this necessary? Wouldn't previous rmb() take care of this? At this time > data specific to this descriptor was completed. The TX descriptors for next > data to be send is updated under a locked spinlock.
The previous rmb() is before the TX_USED check, so my understanding is that the following could happen in theory: 1. rmb(). 2. Reads are reordered so that TX timestamp is read first - no barriers are crossed. 3. HW writes timestamp and sets TX_USED (or they become visible). 4. Code checks TX_USED. 5. Code operates on timestamp that is actually garbage. I'm not 100% sure that there isn't some lighter/cleaner way to do this than dma_rmb(), though. >> if (gem_ptp_do_txstamp(queue, skb, desc) == 0) { >> /* skb now belongs to timestamp buffer >> * and will be removed later >> @@ -1000,12 +1005,16 @@ static int gem_rx(struct macb_queue *queue, int >> budget) >> rmb(); >> >> rxused = (desc->addr & MACB_BIT(RX_USED)) ? true : false; >> - addr = macb_get_addr(bp, desc); >> - ctrl = desc->ctrl; >> >> if (!rxused) >> break; >> >> + /* Ensure other data is at least as up-to-date as rxused */ >> + dma_rmb(); > Same here, wouldn't previous rmb() should do this job? The scenario I'm concerned about here (and in the last hunk) is: 1. rmb(). 2. ctrl is read (i.e. ctrl read reordered before addr read). 3. HW updates to ctrl and addr become visible. 4. RX_USED check. 5. code operates on garbage ctrl. I think it may be OK to move the earlier rmb() outside the loop so that there is an rmb() only before and after the RX loop, as I don't at least immediately see any hard requirement to do it on each loop pass (unlike the added dma_rmb()). But my intent was to fix issues instead of optimization so I didn't look too closely into that. >> + >> + addr = macb_get_addr(bp, desc); >> + ctrl = desc->ctrl; >> + >> queue->rx_tail++; >> count++; >> >> @@ -1180,11 +1189,14 @@ static int macb_rx(struct macb_queue *queue, int >> budget) >> /* Make hw descriptor updates visible to CPU */ >> rmb(); >> >> - ctrl = desc->ctrl; >> - >> if (!(desc->addr & MACB_BIT(RX_USED))) >> break; >> >> + /* Ensure other data is at least as up-to-date as addr */ >> + dma_rmb(); > Ditto > >> + >> + ctrl = desc->ctrl; >> + >> if (ctrl & MACB_BIT(RX_SOF)) { >> if (first_frag != -1) >> discard_partial_frame(queue, first_frag, tail); >> -- Anssi Hannula / Bitwise Oy +358 503803997