On Mon, Oct 21, 2019 at 8:04 PM Subash Abhinov Kasiviswanathan <subas...@codeaurora.org> wrote: > > > Interesting! As tcp_input.c summarizes, "packets_out is > > SND.NXT-SND.UNA counted in packets". In the normal operation of a > > socket, tp->packets_out should not be 0 if any of those other fields > > are non-zero. > > > > The tcp_write_queue_purge() function sets packets_out to 0: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/net/ipv4/tcp.c?h=v4.19#n2526 > > > > So the execution of tcp_write_queue_purge() before this point is one > > way for the socket to end up in this weird state. > > > > In one of the instances, the values are tp->snd_nxt = 1016118098, > tp->snd_una = 1016047820 > > tp->mss_cache = 1378 > > I assume the number of outstanding segments should be > (tp->snd_nxt - tp->snd_una)/tp->mss_cache = 51
That would be a good expectation if all the packets were full-sized. > tp->packets_out = 0 and tp->sacked_out = 158 in this case. OK, thanks. It could be that sacked_out is reasonable and some of the packets were not full-sized. But, as discussed above, typically the packets_out should not be 0 if sacked_out is non-zero (with at least the exception of the tcp_write_queue_purge() case). > >> > Yes, one guess would be that somehow the skbs in the retransmit queue > >> > have been freed, but tp->sacked_out is still non-zero and > >> > tp->highest_sack is still a dangling pointer into one of those freed > >> > skbs. The tcp_write_queue_purge() function is one function that fees > >> > the skbs in the retransmit queue and leaves tp->sacked_out as non-zero > >> > and tp->highest_sack as a dangling pointer to a freed skb, AFAICT, so > >> > that's why I'm wondering about that function. I can't think of a > >> > specific sequence of events that would involve tcp_write_queue_purge() > >> > and then a socket that's still in FIN-WAIT1. Maybe I'm not being > >> > creative enough, or maybe that guess is on the wrong track. Would you > >> > be able to set a new bit in the tcp_sock in tcp_write_queue_purge() > >> > and log it in your instrumentation point, to see if > >> > tcp_write_queue_purge() was called for these connections that cause > >> > this crash? > > I've queued up a build which logs calls to tcp_write_queue_purge and > clears tp->highest_sack and tp->sacked_out. I will let you know how > it fares by end of week. OK, thanks. That should be a useful data point. cheers, neal