I have several xensource servers running lots of PV-On-HVM Windows DomUs and I have a pretty weird problem. Here are my details:
Kernel: 3.5.0-rc2
OpenvSwitch module: Built-in from upstream (aka did not install kernel module when building OpenvSwitch)
OpenvSwitch userland tools: version 1.4.0

I have a single Bridge with two fake-bridges.
I have configured a LACP bond with 4 physical nics that connects to a PortChannel on a 6509.
I setup the native vlan on the 6509 to be 102.
I have configured the bond with vlan_mode=native-untagged and tag=102.
All my vms are added to the fake-bridge associated with vlan 102.
I have four servers configured this way all connected to the same 6509, ServerA, ServerB, ServerC, and ServerD.

I have no problem sending and receiving traffic to any VM on any of the four servers, in other words all my VMs get IPs from a DHCP Server and can icmp each other. I have decent performance moving files, SMB2, from VMs that are on different servers, aka VM1 on ServerA copies a file to VM2 on ServerB.

I have horrible performance when moving files, SMB2, from VMs that are on the same server, aka VM1 on ServerA copies a file to VM2 on ServerA. I am not an expert on how OpenvSwitch works, and I can't discount that my own stupidity may be behind this, but I am at a loss for what to do to troubleshoot this.

I have captured packets of a reproducible type or session of network traffic, aka Logging into a VM with the same account which has a roaming profile configured. This pulls down about 50MB of data and when logging into a VM that is on a different server than the file server that hosts the profile it takes about 11 seconds. When logging into a VM that is on the same server as the file server it takes well over 20 minutes and never really succeeds.

What I can see that is different from the two packet captures are the amount of retransmits, Duplicate ACKs and Out-of-Order packets are insane when going from vm to vm on the same server.

It seems to me after looking at the traffic in the capture that everything is trucking along until we get to a large file, say 5MB, then it just falls apart. On the VM that is on a different server, I can get the file moved across in only 458 packets, with only 36 TCP ACKed lost segment packets flagged. On the VM that is on the same server, I can't get the file moved across even after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or retransmission packets flagged.

There has to be something going on that could explain this, but I am at a loss! Any help would be greatly appreciated!!
Mike

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to