On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins <mike.a.coll...@ark-net.org> wrote: > I have several xensource servers running lots of PV-On-HVM Windows DomUs and > I have a pretty weird problem. Here are my details: > Kernel: 3.5.0-rc2 > OpenvSwitch module: Built-in from upstream (aka did not install kernel > module when building OpenvSwitch) > OpenvSwitch userland tools: version 1.4.0 > > I have a single Bridge with two fake-bridges. > I have configured a LACP bond with 4 physical nics that connects to a > PortChannel on a 6509. > I setup the native vlan on the 6509 to be 102. > I have configured the bond with vlan_mode=native-untagged and tag=102. > All my vms are added to the fake-bridge associated with vlan 102. > I have four servers configured this way all connected to the same 6509, > ServerA, ServerB, ServerC, and ServerD. > > I have no problem sending and receiving traffic to any VM on any of the four > servers, in other words all my VMs get IPs from a DHCP Server and can icmp > each other. > I have decent performance moving files, SMB2, from VMs that are on different > servers, aka VM1 on ServerA copies a file to VM2 on ServerB. > > I have horrible performance when moving files, SMB2, from VMs that are on > the same server, aka VM1 on ServerA copies a file to VM2 on ServerA. I am > not an expert on how OpenvSwitch works, and I can't discount that my own > stupidity may be behind this, but I am at a loss for what to do to > troubleshoot this. > > I have captured packets of a reproducible type or session of network > traffic, aka Logging into a VM with the same account which has a roaming > profile configured. This pulls down about 50MB of data and when logging > into a VM that is on a different server than the file server that hosts the > profile it takes about 11 seconds. When logging into a VM that is on the > same server as the file server it takes well over 20 minutes and never > really succeeds. > > What I can see that is different from the two packet captures are the amount > of retransmits, Duplicate ACKs and Out-of-Order packets are insane when > going from vm to vm on the same server. > > It seems to me after looking at the traffic in the capture that everything > is trucking along until we get to a large file, say 5MB, then it just falls > apart. On the VM that is on a different server, I can get the file moved > across in only 458 packets, with only 36 TCP ACKed lost segment packets > flagged. > On the VM that is on the same server, I can't get the file moved across even > after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or retransmission > packets flagged. > > There has to be something going on that could explain this, but I am at a > loss! Any help would be greatly appreciated!!
The fact that it only happens when you start to see large packets likely means that it is related to TCP segmentation offload. I know that some versions of the Windows PV drivers on Xen had bugs in this area so I would look to see if there is a newer version that you can upgrade to. I don't know which versions are affected though. _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss