On 06/22/2016 11:22 AM, Yuval Mintz wrote:
But seriously, this isn't really anything new but rather a step forward in
the direction we've already taken - bnx2x/qede are already performing
the same for non-encapsulated TCP.
Since you mention bnx2x... I would argue that the NIC firmware on
those NICs driven by bnx2x is doing it badly. Not so much from a
functional standpoint I suppose, but from a performance one. The
NIC-firmware GRO done there has this rather unfortunate assumption about
"all MSSes will be directly driven by my own physical MTU" and when it
sees segments of a size other than would be suggested by the physical
MTU, will coalesce only two segments together. They then do not get
further coalesced in the stack.
Suffice it to say this does not do well from a performance standpoint.
One can disable LRO via ethtool for these NICs, but what that does is
disable old-school LRO, not GRO-in-the-NIC. To get that disabled, one
must also get the bnx2x module loaded with "disable-tpa=1" so the Linux
stack GRO gets used instead.
Had the bnx2x-driven NICs' firmware not had that rather unfortunate
assumption about MSSes I probably would never have noticed.
happy benchmarking,
rick jones