On Wed, Aug 29, 2012 at 7:43 PM, Mike Collins <mike.a.coll...@ark-net.org> wrote: > On Aug 29, 2012, at 9:27 PM, Jesse Gross <je...@nicira.com> wrote: > >> On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins >> <mike.a.coll...@ark-net.org> wrote: >>> >>> I have several xensource servers running lots of PV-On-HVM Windows DomUs >>> and >>> I have a pretty weird problem. Here are my details: >>> Kernel: 3.5.0-rc2 >>> OpenvSwitch module: Built-in from upstream (aka did not install kernel >>> module when building OpenvSwitch) >>> OpenvSwitch userland tools: version 1.4.0 >>> >>> I have a single Bridge with two fake-bridges. >>> I have configured a LACP bond with 4 physical nics that connects to a >>> PortChannel on a 6509. >>> I setup the native vlan on the 6509 to be 102. >>> I have configured the bond with vlan_mode=native-untagged and tag=102. >>> All my vms are added to the fake-bridge associated with vlan 102. >>> I have four servers configured this way all connected to the same 6509, >>> ServerA, ServerB, ServerC, and ServerD. >>> >>> I have no problem sending and receiving traffic to any VM on any of the >>> four >>> servers, in other words all my VMs get IPs from a DHCP Server and can >>> icmp >>> each other. >>> I have decent performance moving files, SMB2, from VMs that are on >>> different >>> servers, aka VM1 on ServerA copies a file to VM2 on ServerB. >>> >>> I have horrible performance when moving files, SMB2, from VMs that are on >>> the same server, aka VM1 on ServerA copies a file to VM2 on ServerA. I >>> am >>> not an expert on how OpenvSwitch works, and I can't discount that my own >>> stupidity may be behind this, but I am at a loss for what to do to >>> troubleshoot this. >>> >>> I have captured packets of a reproducible type or session of network >>> traffic, aka Logging into a VM with the same account which has a roaming >>> profile configured. This pulls down about 50MB of data and when logging >>> into a VM that is on a different server than the file server that hosts >>> the >>> profile it takes about 11 seconds. When logging into a VM that is on the >>> same server as the file server it takes well over 20 minutes and never >>> really succeeds. >>> >>> What I can see that is different from the two packet captures are the >>> amount >>> of retransmits, Duplicate ACKs and Out-of-Order packets are insane when >>> going from vm to vm on the same server. >>> >>> It seems to me after looking at the traffic in the capture that >>> everything >>> is trucking along until we get to a large file, say 5MB, then it just >>> falls >>> apart. On the VM that is on a different server, I can get the file moved >>> across in only 458 packets, with only 36 TCP ACKed lost segment packets >>> flagged. >>> On the VM that is on the same server, I can't get the file moved across >>> even >>> after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or retransmission >>> packets flagged. >>> >>> There has to be something going on that could explain this, but I am at a >>> loss! Any help would be greatly appreciated!! >> >> >> The fact that it only happens when you start to see large packets >> likely means that it is related to TCP segmentation offload. I know >> that some versions of the Windows PV drivers on Xen had bugs in this >> area so I would look to see if there is a newer version that you can >> upgrade to. I don't know which versions are affected though. >> > Wouldn't TCP seg offload affect all the traffic not just the traffic that > stays on the bridge? I will go grab the newest version of the pv drivers > and let you know.
It should always be used but the effect depends on what the problem is and how the other side deals with it. _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss