On Wed, Aug 29, 2012 at 7:43 PM, Mike Collins
<mike.a.coll...@ark-net.org> wrote:
> On Aug 29, 2012, at 9:27 PM, Jesse Gross <je...@nicira.com> wrote:
>
>> On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins
>> <mike.a.coll...@ark-net.org> wrote:
>>>
>>> I have several xensource servers running lots of PV-On-HVM Windows DomUs
>>> and
>>> I have a pretty weird problem.  Here are my details:
>>> Kernel: 3.5.0-rc2
>>> OpenvSwitch module: Built-in from upstream (aka did not install kernel
>>> module when building OpenvSwitch)
>>> OpenvSwitch userland tools: version 1.4.0
>>>
>>> I have a single Bridge with two fake-bridges.
>>> I have configured a LACP bond with 4 physical nics that connects to a
>>> PortChannel on a 6509.
>>> I setup the native vlan on the 6509 to be 102.
>>> I have configured the bond with vlan_mode=native-untagged and tag=102.
>>> All my vms are added to the fake-bridge associated with vlan 102.
>>> I have four servers configured this way all connected to the same 6509,
>>> ServerA, ServerB, ServerC, and ServerD.
>>>
>>> I have no problem sending and receiving traffic to any VM on any of the
>>> four
>>> servers, in other words all my VMs get IPs from a DHCP Server and can
>>> icmp
>>> each other.
>>> I have decent performance moving files, SMB2, from VMs that are on
>>> different
>>> servers, aka VM1 on ServerA copies a file to VM2 on ServerB.
>>>
>>> I have horrible performance when moving files, SMB2, from VMs that are on
>>> the same server, aka VM1 on ServerA copies a file to VM2 on ServerA.  I
>>> am
>>> not an expert on how OpenvSwitch works, and I can't discount that my own
>>> stupidity may be behind this, but I am at a loss for what to do to
>>> troubleshoot this.
>>>
>>> I have captured packets of a reproducible type or session of network
>>> traffic, aka Logging into a VM with the same account which has a roaming
>>> profile configured.  This pulls down about 50MB of data and when logging
>>> into a VM that is on a different server than the file server that hosts
>>> the
>>> profile it takes about 11 seconds.  When logging into a VM that is on the
>>> same server as the file server it takes well over 20 minutes and never
>>> really succeeds.
>>>
>>> What I can see that is different from the two packet captures are the
>>> amount
>>> of retransmits, Duplicate ACKs and Out-of-Order packets are insane when
>>> going from vm to vm on the same server.
>>>
>>> It seems to me after looking at the traffic in the capture that
>>> everything
>>> is trucking along until we get to a large file, say 5MB, then it just
>>> falls
>>> apart.  On the VM that is on a different server, I can get the file moved
>>> across in only 458 packets, with only 36 TCP ACKed lost segment packets
>>> flagged.
>>> On the VM that is on the same server, I can't get the file moved across
>>> even
>>> after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or retransmission
>>> packets flagged.
>>>
>>> There has to be something going on that could explain this, but I am at a
>>> loss!  Any help would be greatly appreciated!!
>>
>>
>> The fact that it only happens when you start to see large packets
>> likely means that it is related to TCP segmentation offload.  I know
>> that some versions of the Windows PV drivers on Xen had bugs in this
>> area so I would look to see if there is a newer version that you can
>> upgrade to.  I don't know which versions are affected though.
>>
> Wouldn't TCP seg offload affect all the traffic not just the traffic that
> stays on the bridge?  I will go grab the newest version of the pv drivers
> and let you know.

It should always be used but the effect depends on what the problem is
and how the other side deals with it.
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to