On Mon, Jan 28, 2019 at 06:45:47PM +0100, Sebastian Andrzej Siewior wrote: > During sendmsg() a cloned skb is saved via dp83640_txtstamp() in > ->tx_queue. After the NIC sends this packet, the PHY will reply with a > timestamp for that TX packet. If the cable is pulled at the right time I > don't see that packet. It might gets flushed as part of queue shutdown > on NIC's side. > Once the link is up again then after the next sendmsg() we enqueue > another skb in dp83640_txtstamp() and have two on the list. Then the PHY > will send a reply and decode_txts() attaches it to the first skb on the > list. > No crash occurs since refcounting works but we are one packet behind. > linuxptp/ptp4l usually closes the socket and opens a new one (in such a > timeout case) so those "stale" replies never get there. However it does > not resume normal operation anymore.
Thanks for the detailed explanation. This sounds like a really rare bug, but maybe you guys were able to trigger it reliably? > Purge old skbs in decode_txts(). It is too bad that the Tx timestamp from the HW doesn't provide matching fields. Using the timeout is probably the best that we can do. > Reviewed-by: Kurt Kanzenbach <k...@linutronix.de> > Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> Order: signed-off goes before reviewed-by. > --- > drivers/net/phy/dp83640.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c > index 1dc043d0bc875..a50e0680a0322 100644 > --- a/drivers/net/phy/dp83640.c > +++ b/drivers/net/phy/dp83640.c > @@ -920,13 +920,13 @@ static void decode_txts(struct dp83640_private *dp83640, > { > struct skb_shared_hwtstamps shhwtstamps; > struct sk_buff *skb; > + struct dp83640_skb_info *skb_info; Reverse Christmas tree please, > u64 ns; > u8 overflow; and fix ^^^ while you are at it. > > /* We must already have the skb that triggered this. */ > - > +again: > skb = skb_dequeue(&dp83640->tx_queue); > - > if (!skb) { > pr_debug("have timestamp but tx_queue empty\n"); > return; > @@ -941,6 +941,11 @@ static void decode_txts(struct dp83640_private *dp83640, > } > return; > } > + skb_info = (struct dp83640_skb_info *)skb->cb; > + if (time_after(jiffies, skb_info->tmo)) { > + kfree_skb(skb); > + goto again; > + } > > ns = phy2txts(phy_txts); > memset(&shhwtstamps, 0, sizeof(shhwtstamps)); > @@ -1489,6 +1494,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, > struct sk_buff *skb, int type) > { > struct dp83640_private *dp83640 = phydev->priv; > + struct dp83640_skb_info *skb_info = (struct dp83640_skb_info *)skb->cb; Reverse Christmas tree. Thanks, Richard > > switch (dp83640->hwts_tx_en) { > > @@ -1500,6 +1506,7 @@ static void dp83640_txtstamp(struct phy_device *phydev, > /* fall through */ > case HWTSTAMP_TX_ON: > skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; > + skb_info->tmo = jiffies + SKB_TIMESTAMP_TIMEOUT; > skb_queue_tail(&dp83640->tx_queue, skb); > break; > > -- > 2.20.1 >