Re: [ovs-discuss] Intra-Bridge Perfomance issue

Mike Collins Thu, 30 Aug 2012 18:16:08 -0700


On Aug 30, 2012, at 5:50 PM, Jesse Gross <je...@nicira.com> wrote:

On Wed, Aug 29, 2012 at 7:43 PM, Mike Collins
<mike.a.coll...@ark-net.org> wrote:
On Aug 29, 2012, at 9:27 PM, Jesse Gross <je...@nicira.com> wrote:
On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins
<mike.a.coll...@ark-net.org> wrote:
I have several xensource servers running lots of PV-On-HVMWindows DomUs
and
I have a pretty weird problem.  Here are my details:
Kernel: 3.5.0-rc2
OpenvSwitch module: Built-in from upstream (aka did not installkernel
module when building OpenvSwitch)
OpenvSwitch userland tools: version 1.4.0

I have a single Bridge with two fake-bridges.
I have configured a LACP bond with 4 physical nics that connectsto a
PortChannel on a 6509.
I setup the native vlan on the 6509 to be 102.
I have configured the bond with vlan_mode=native-untagged andtag=102.
All my vms are added to the fake-bridge associated with vlan 102.
I have four servers configured this way all connected to the same6509,
ServerA, ServerB, ServerC, and ServerD.
I have no problem sending and receiving traffic to any VM on anyof the
four
servers, in other words all my VMs get IPs from a DHCP Server andcan
icmp
each other.
I have decent performance moving files, SMB2, from VMs that are on
different
servers, aka VM1 on ServerA copies a file to VM2 on ServerB.
I have horrible performance when moving files, SMB2, from VMsthat are onthe same server, aka VM1 on ServerA copies a file to VM2 onServerA. I
am
not an expert on how OpenvSwitch works, and I can't discount thatmy own
stupidity may be behind this, but I am at a loss for what to do to
troubleshoot this.
I have captured packets of a reproducible type or session ofnetworktraffic, aka Logging into a VM with the same account which has aroamingprofile configured. This pulls down about 50MB of data and whenlogginginto a VM that is on a different server than the file server thathosts
the
profile it takes about 11 seconds. When logging into a VM thatis on thesame server as the file server it takes well over 20 minutes andnever
really succeeds.
What I can see that is different from the two packet captures arethe
amount
of retransmits, Duplicate ACKs and Out-of-Order packets areinsane when
going from vm to vm on the same server.

It seems to me after looking at the traffic in the capture that
everything
is trucking along until we get to a large file, say 5MB, then itjust
falls
apart. On the VM that is on a different server, I can get thefile movedacross in only 458 packets, with only 36 TCP ACKed lost segmentpackets
flagged.
On the VM that is on the same server, I can't get the file movedacross
even
after 6800+ packets, with 5500 Dup ACKs, Out-of-Order orretransmission
packets flagged.
There has to be something going on that could explain this, but Iam at a
loss!  Any help would be greatly appreciated!!
The fact that it only happens when you start to see large packets
likely means that it is related to TCP segmentation offload.  I know
that some versions of the Windows PV drivers on Xen had bugs in this
area so I would look to see if there is a newer version that you can
upgrade to.  I don't know which versions are affected though.
Wouldn't TCP seg offload affect all the traffic not just thetraffic thatstays on the bridge? I will go grab the newest version of the pvdrivers
and let you know.
It should always be used but the effect depends on what the problem is
and how the other side deals with it.

Ok it looks like it was a TSO issue. Disabling it on two vms on thesame host fixed it. What a pain!

Thanks for your help and insight!
Mike
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] Intra-Bridge Perfomance issue

Reply via email to