On 7/31/17 4:01 PM, Cong Wang wrote: >>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c >>> index 3a19ea28339f..37db087b6c97 100644 >>> --- a/net/ipv4/tcp_ipv4.c >>> +++ b/net/ipv4/tcp_ipv4.c >>> @@ -1855,7 +1855,7 @@ void inet_sk_rx_dst_set(struct sock *sk, const >>> struct sk_buff *skb) >>> { >>> struct dst_entry *dst = skb_dst(skb); >>> >>> - if (dst && dst_hold_safe(dst)) { >>> + if (0 && dst && dst_hold_safe(dst)) { >>> sk->sk_rx_dst = dst; >>> inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; >>> } >> >> >> This removes the 200s stall (the test is IPv4/TCP based) > > > Interesting. This means we have a kernel socket which holds > the dst refcnt.
Right now there is no tracking that I am aware of for a dst cached on the socket (outside of walking all sockets). I have been bitten by it several times in trying to make various changes. It's basically a hidden reference for the device.