Hi, Dag and All
Yes, we are using active-active(mode 7) for bond. VM A1 ---> VR A(Isloated Network A) ----> VR B(Isolated Network B) ----> VM B1 After rounds of isoloation, based on packet analysis, it seems to us - the traffic between VM A1 and VR A is normal - however, between VR A and VM B1, VR A receives packets aknowledges from VM B1 which VR A thinks they has not sent thru it yet. - Then, VR A reset the session, causing the traffic dropped. For testing purpose, we turned off the TSO (tcp-segmentation-offload )on XenServer network adpaters by command 'ethtool -k eth0 tso off', the issue is just gone, we can run iperf for testing without any drop for a couple of hours. Does it make sense ? Any improvement can be implemented from ACS side ? Thanks ! 在2019年02月22 23时20分, "Haijiao"<18602198...@163.com>写道: Thanks Dag, you are always helpful ! We will look into your sharing and come back. 在2019年02月22 17时26分, "Dag Sonstebo"<dag.sonst...@shapeblue.com>写道: Hi Haijiao, We've come across similar things in the past. In short - what is your XenServer bond mode? Is it active-active (mode 7) or LACP (mode 4)? (see https://support.citrix.com/article/CTX137599) In short if your switches don't keep up with MAC address changes on the XS hosts then you will get traffic flapping with intermittent loss of connectivity (root cause is a MAC address moves to another uplink, but the switch only checks for changes every X seconds so it takes a while for it to catch up). LACP mode 4 has a much more robust mechanism for this but obviously needs configured both XS and switch end. Normal active-active (mode 7) seems to always cause problems. My general advise would be to simplify and just go active-passive (mode 1) - unless you really need the bandwidth this gives you a much more stable network backend. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 22/02/2019, 07:14, "Haijiao" <18602198...@163.com> wrote: Hi, Devs and Community Users To be more specific, our environment is built with * 2 Dell R740XD Servers + Dell Compellent Storage w/ iSCSI * Each server equiped with two Mellanox Connect-4 LX 25GbE network adapters, and configured with bond mode(active+active) in XenServer * CloudStack 4.11.2 LTS + XenServer 7.1CU2(LTS) Enterprise Everything goes fine with shared network, but the weird thing is if we setup 2 isolated networks, try to use 'iperf', 'wget' or 'SCP' to test the network performance betwen two VMs located in these 2 isolated networks, the traffic will drop to zero in about 200-300 seconds, even though we were still able to ping or SSH VM B1 from A1 or verse. VM A1 ---> VR A(Isloated Network A) ----> VR B(Isolated Network B) ----> VM B1 ---------------------------------------------------------------------------------------------------------------------------------------- We have checked the configuration on switches, upgraded Mellanox driver for XenServer, but no luck. Meanwhile, we can not re-produce this issue in another environment (XenServer 7.1CU2+ACS 4.11.2+ Intel Gb network). It seems it might be related to Mellanox adapter, but we have no idea what part we could possibly miss in this case. Any advice would be highly appreciated ! Thank you ! 在2019年02月22 13时09分, "gu haven"<gumin...@hotmail.com>写道: hi ,all I try iperf wget scp connection will break after 200 seconds ,Do need any optimization in vr ? environment infomation below: cloudstack 4.11.2 xenserver 7.1 CU2 Enterprise NIC :MLNX 25GbE 2P ConnectX4LX bond mode in xenserver : acitce-active dag.sonst...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue