On Thu, Jun 23, 2016 at 3:08 AM, Arjun V. <ar...@chelsio.com> wrote: > Hi all, > > The following patch introduced a regression in Chelsio cxgb4 driver, causing > port failure when running heavy TSO traffic: > > commit 10d3be569243def8d92ac3722395ef5a59c504e6 > Author: Eric Dumazet <eduma...@google.com> > Date: Thu Apr 21 10:55:23 2016 -0700 > > tcp-tso: do not split TSO packets at retransmit time > > Linux TCP stack painfully segments all TSO/GSO packets before retransmits. > > This was fine back in the days when TSO/GSO were emerging, with their > bugs, but we believe the dark age is over. > > Keeping big packets in write queues, but also in stack traversal > has a lot of benefits. > - Less memory overhead, because write queues have less skbs > - Less cpu overhead at ACK processing. > - Better SACK processing, as lot of studies mentioned how > awful linux was at this ;) > - Less cpu overhead to send the rtx packets > (IP stack traversal, netfilter traversal, drivers...) > - Better latencies in presence of losses. > - Smaller spikes in fq like packet schedulers, as retransmits > are not constrained by TCP Small Queues. > > 1 % packet losses are common today, and at 100Gbit speeds, this > translates to ~80,000 losses per second. > Losses are often correlated, and we see many retransmit events > leading to 1-MSS train of packets, at the time hosts are already > under stress. > > Signed-off-by: Eric Dumazet <eduma...@google.com> > Acked-by: Yuchung Cheng <ych...@google.com> > Signed-off-by: David S. Miller da...@davemloft.net > > When the number of TCP retransmissions are quite high, the packet length > coming from stack does not seems to be proper, due to which our TSO module > gets stuck. > If I change segs back to 1 in __tcp_retransmit_skb(), traffic is running > fine. Please let us know if we are missing something. > > Thanks, > Arjun. >
Hmm... I see nothing wrong in TCP stack. Can you give me more details on the wrong packet length you see ?