On 07.09.2012 23:44, Jeremiah Lott wrote:
On Apr 27, 2012, at 2:07 AM, lini...@freebsd.org wrote:
Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC New
Synopsis:
[netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC
http://www.freebsd.org/cgi/query-pr.cgi?pr=167325
I did an analysis of this pr a while back and I figured I'd share. Definitely
looks like a real
problem here, but at least in 8.2 it is difficult to hit it. First off, vlan
tagging is not
required to hit this. The code is question does not account for any amount of
link-local header,
so you can reproduce the bug even without vlans.
In order to trigger it, the tcp stack must choose to send a tso "packet" with a
total size
(including tcp+ip header and options, but not link-local header) between 65522
and 65535 bytes
(because adding 14 byte link-local header will then exceed 64K limit). In 8.1,
the tcp stack
only chooses to send tso bursts that will result in full mtu-size on-wire
packets. To achieve
this, it will truncate the tso packet size to be a multiple of mss, not
including header and tcp
options. The check has been relaxed a little in head, but the same basic check
is still there.
None of the "normal" mtus have multiples falling in this range. To reproduce
it I used an mtu of
1445. When timestamps are in use, every packet has a 40 bytes tcp/ip header +
10 bytes for the
timestamp option + 2 bytes pad. You can get a packet length 65523 as follows:
65523 - (40 + 10 + 2) = 65471 (size of tso packet data) 65471 / 47 = 1393 (size
of data per
on-wire packet) 1393 + (40 + 10 + 2) = 1445 (mtu is data + header + options +
pad)
Once you set your mtu to 1445, you need a program that can get the stack to
send a maximum sized
packet. With the congestion window that can be more difficult than it seems.
I used some python
that sends enough data to open the window, sleeps long enough to drain all
outstanding data, but
not long enough for the congestion window to go stale and close again, then
sends a bunch more
data. It also helps to turn off delayed acks on the receiver. Sometimes you
will not drain the
entire send buffer because an ack for the final chunk is still delayed when you
start the second
transmit. When the problem described in the pr hits, the EINVAL from
bus_dmamap_load_mbuf_sg
bubbles right up to userspace.
At first I thought this was a driver bug rather than stack bug. The code in
question does what
it is commented to do (limit the tso packet so that ip->ip_len does not
overflow). However, it
also seems reasonable that the driver limit its dma tag at 64K (do we really
want it allocating
another whole page just for the 14 byte link-local header). Perhaps the tcp
stack should ensure
that the tso packet + max_linkhdr is < 64K. Comments?
Thank you for the analysis. I'm looking into it.
As an aside, the patch attached to the pr is also slightly wrong. Taking the
max_linkhdr into
account when rounding the packet to be a multiple of mss does not make sense,
it should only take
it into account when calculating the max tso length.
--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"