Implementation details:
1. Handle packet chains from multiple sessions(current default
MAX_LRO_SESSSIONS=32).
2. Examine each packet for eligiblity to aggregate. A packet is
considered eligible if it meets all the below criteria.
a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
b. The packet has no checksum errors(L3 and L4).
c. There are no TCP or IP options.
_No_ TCP options? Not even Timestamps? Given that one can theoretically wrap
the 32-bit TCP sequence space in something like four seconds, and the general
goodness of timestamps when using window scaling, one might think that
timestamps being enabled if not already common today would become more common?
d. Search and locate the LRO object corresponding to this
socket and ensure packet is in TCP sequence.
e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not set).
f. TCP payload is non-zero(It's not a pure ACK).
g. It's not an IP-fragmented packet.
3. If a packet is found eligible, the LRO object is updated with
information such as next sequence number expected, current length
of aggregated packet and so on. If not eligible or max packets
reached, update IP and TCP headers of first packet in the chain
and pass it up to stack.
4. The frag_list in skb structure is used to chain packets into one
large packet.
Kernel changes required: None
Performance results:
Main focus of the initial testing was on 1500 mtu receiver, since this
is a bottleneck not covered by the existing stateless offloads.
There are couple disclaimers about the performance results below:
1. Your mileage will vary!!!! We initially concentrated on couple pci-x 2.0
platforms that are powerful enough to push 10 GbE NIC and do not
have bottlenecks other than cpu%; testing on other platforms is still
in progress. On some lower end systems we are seeing lower gains.
You should still see benefits in reported service demand no?
2. Current LRO implementation is still (for the most part) software based,
and therefore performance potential of the feature is far from being realized.
Full hw implementation of LRO is expected in the next version of Xframe ASIC.
Performance delta(with MTU=1500) going from LRO disabled to enabled:
IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
2-way Opteron : 4.5 to 6.1 Gbps
Service demand changes?
rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html