On Apr 27, 2012, at 2:07 AM, lini...@freebsd.org wrote:

> Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC
> New Synopsis: [netinet] [patch] sosend sometimes return EINVAL with TSO and 
> VLAN on 82599 NIC

> http://www.freebsd.org/cgi/query-pr.cgi?pr=167325

I did an analysis of this pr a while back and I figured I'd share.  Definitely 
looks like a real problem here, but at least in 8.2 it is difficult to hit it.  
First off, vlan tagging is not required to hit this.  The code is question does 
not account for any amount of link-local header, so you can reproduce the bug 
even without vlans.

In order to trigger it, the tcp stack must choose to send a tso "packet" with a 
total size (including tcp+ip header and options, but not link-local header) 
between 65522 and 65535 bytes (because adding 14 byte link-local header will 
then exceed 64K limit).  In 8.1, the tcp stack only chooses to send tso bursts 
that will result in full mtu-size on-wire packets.  To achieve this, it will 
truncate the tso packet size to be a multiple of mss, not including header and 
tcp options.  The check has been relaxed a little in head, but the same basic 
check is still there.  None of the "normal" mtus have multiples falling in this 
range.  To reproduce it I used an mtu of 1445.  When timestamps are in use, 
every packet has a 40 bytes tcp/ip header + 10 bytes for the timestamp option + 
2 bytes pad.  You can get a packet length 65523 as follows:

65523 - (40 + 10 + 2) = 65471 (size of tso packet data)
65471 / 47 = 1393 (size of data per on-wire packet)
1393 + (40 + 10 + 2) = 1445 (mtu is data + header + options + pad)

Once you set your mtu to 1445, you need a program that can get the stack to 
send a maximum sized packet.  With the congestion window that can be more 
difficult than it seems.  I used some python that sends enough data to open the 
window, sleeps long enough to drain all outstanding data, but not long enough 
for the congestion window to go stale and close again, then sends a bunch more 
data.  It also helps to turn off delayed acks on the receiver.  Sometimes you 
will not drain the entire send buffer because an ack for the final chunk is 
still delayed when you start the second transmit.  When the problem described 
in the pr hits, the EINVAL from bus_dmamap_load_mbuf_sg bubbles right up to 
userspace.

At first I thought this was a driver bug rather than stack bug.  The code in 
question does what it is commented to do (limit the tso packet so that 
ip->ip_len does not overflow).  However, it also seems reasonable that the 
driver limit its dma tag at 64K (do we really want it allocating another whole 
page just for the 14 byte link-local header).  Perhaps the tcp stack should 
ensure that the tso packet + max_linkhdr is < 64K.  Comments?

As an aside, the patch attached to the pr is also slightly wrong.  Taking the 
max_linkhdr into account when rounding the packet to be a multiple of mss does 
not make sense, it should only take it into account when calculating the max 
tso length.

  Jeremiah Lott
  Avere Systems

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to