Hi, Dag and All
Yes, we are using active-active(mode 7) for bond.
VM A1 ---> VR A(Isloated Network A) > VR B(Isolated Network B) > VM B1
After rounds of isoloation, based on packet analysis, it seems to us
- the traffic between VM A1 and VR A is normal
- however, between VR A and VM B1, VR A receives packets aknowledges
from VM B1 which VR A thinks they has not sent thru it yet.
- Then, VR A reset the session, causing the traffic dropped.
For testing purpose, we turned off the TSO (tcp-segmentation-offload )on
XenServer network adpaters by command 'ethtool -k eth0 tso off', the issue is
just gone, we can run iperf for testing without any drop for a couple of hours.
Does it make sense ? Any improvement can be implemented from ACS side ?
Thanks !
在2019年02月22 23时20分, "Haijiao"<18602198...@163.com>写道:
Thanks Dag, you are always helpful !
We will look into your sharing and come back.
在2019年02月22 17时26分, "Dag Sonstebo"写道:
Hi Haijiao,
We've come across similar things in the past. In short - what is your XenServer
bond mode? Is it active-active (mode 7) or LACP (mode 4)? (see
https://support.citrix.com/article/CTX137599)
In short if your switches don't keep up with MAC address changes on the XS
hosts then you will get traffic flapping with intermittent loss of connectivity
(root cause is a MAC address moves to another uplink, but the switch only
checks for changes every X seconds so it takes a while for it to catch up).
LACP mode 4 has a much more robust mechanism for this but obviously needs
configured both XS and switch end. Normal active-active (mode 7) seems to
always cause problems.
My general advise would be to simplify and just go active-passive (mode 1) -
unless you really need the bandwidth this gives you a much more stable network
backend.
Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue
On 22/02/2019, 07:14, "Haijiao" <18602198...@163.com> wrote:
Hi, Devs and Community Users
To be more specific, our environment is built with
* 2 Dell R740XD Servers + Dell Compellent Storage w/ iSCSI
* Each server equiped with two Mellanox Connect-4 LX 25GbE network adapters,
and configured with bond mode(active+active) in XenServer
* CloudStack 4.11.2 LTS + XenServer 7.1CU2(LTS) Enterprise
Everything goes fine with shared network, but the weird thing is if we setup
2 isolated networks, try to use 'iperf', 'wget' or 'SCP' to test the network
performance betwen two VMs located in these 2 isolated networks, the traffic
will drop to zero in about 200-300 seconds, even though we were still able to
ping or SSH VM B1 from A1 or verse.
VM A1 ---> VR A(Isloated Network A) > VR B(Isolated Network B) > VM
B1
We have checked the configuration on switches, upgraded Mellanox driver for
XenServer, but no luck.
Meanwhile, we can not re-produce this issue in another environment
(XenServer 7.1CU2+ACS 4.11.2+ Intel Gb network).
It seems it might be related to Mellanox adapter, but we have no idea what
part we could possibly miss in this case.
Any advice would be highly appreciated ! Thank you !
在2019年02月22 13时09分, "gu haven"写道:
hi ,all
I try iperf wget scp connection will break after 200 seconds ,Do need
any optimization in vr ?
environment infomation below:
cloudstack 4.11.2
xenserver 7.1 CU2 Enterprise
NIC :MLNX 25GbE 2P ConnectX4LX
bond mode in xenserver : acitce-active
dag.sonst...@shapeblue.com
www.shapeblue.com
Amadeus House, Floral Street, London WC2E 9DPUK
@shapeblue