Hi guys, thanks for quick reply:
- VM issue happens on Windows mostly (one customer is of particularly bad luck as it seems), but afaik also happens on Linux, and FortiOS (some FW stuff, not pure linux) - both are running PV stuff (Windows PV, or CentOS 6.5 x64 OS type) - we are actually using LACP on switches, and I also disabled/down one bond interface on the host - although it makes zero sense, because packer already arrived via this bond0, to te bond0.XXX vlan (VTEP), then packet also arrived to child vxlan interface (vxlan on top of vlan on top of bond...) and then packet also arrived to bridge, but was never passed to VNET. My expectation is that this is purely inside-the-host problem, since packet arrives from outside physical network to the host's vxlan/bridge...but not to VNET. Seems like some qemu issue, but I found zero things using google, that looks similar like our issue. Have no idea... On 9 October 2017 at 23:08, Dag Sonstebo <dag.sonst...@shapeblue.com> wrote: > Hi Andrija, > > Do you use NIC bonds? I have seen this before when using active-active > bonds, and as you say it can be very difficult to troubleshoot and the > behaviour makes little sense. What can happen is network traffic is load > balanced between the two NICs, however the update frequency of the MAC > tables between the two switches don’t keep up with the load balanced > traffic. In other words a MAC address which used to transmit on hypervisor > eth0 (attached to your first top of rack switch) of a bond has suddenly due > to load started transmitting on eth1 (attached to the second of the top of > rack switches) of the bond, however the physical switch stack still thinks > the MAC address lives on eth0, hence traffic is dropped until next time the > switches synch MAC tables. > > We used to see this a lot in the past on XenServer – the solution being > moving to active-passive bond modes, or go up to LACP/802.3ad if your > hardware allows for it. The same principle will however also apply on > generic linux bonds. > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > S: +44 20 3603 0540 | dag.sonst...@shapeblue.com | > http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue > <https://twitter.com/#!/shapeblue> > > > On 09/10/2017, 21:52, "Andrija Panic" <andrija.pa...@gmail.com> wrote: > > Hi guys, > > we have occasional but serious problem, that starts happening as it > seems > randomly (i.e. NOT under high load) - not ACS related afaik, purely > KVM, > but feedback is really welcomed. > > - VM is reachable in general from everywhere, but not reachable from > specific IP address ?! > - VM is NOT under high load, network traffic next to zero, same for > CPU/disk... > - We mitigate this problem by migrating VM away to another host, not > much > of a solution... > > Description of problem: > > We let ping from "problematic" source IP address to the problematic > VM, and > we capture traffic on KVM host where the problematic VM lives: > > - Tcpdump on VXLAN interface (physical incoming interface on the host) > - we > see packet fine > - tcpdump on BRIDGE = we see packet fine > - tcpdump on VNET = we DON'T see packet. > > In the scenario above, I need to say that : > - we can tcpdump packets from other source IPs on the VNET interface > just > fine (as expected), so should also see this problematic source IP's > packets > - we can actually ping in oposite direction - from the problematic VM > to > the problematic "source" IP > > We checked everything possible, from bridge port forwarding, to > mac-to-vtep > mapping, to many other things, removed traffic shaping from VNET > interface, > no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to > bridge, destroy bridge and create manually on the fly, > > Problem is really crazy, and I can not explain it - no iptables, no > ebtables for troubleshooting pruposes (on this host) and > > We mitigate this problem by migrating VM away to another host, not > much of > a solution... > > This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1), > Stock kernel 3.16-xx, regular bridge (not OVS) > > Anyone else ever heard of such problem - this is not intermittent > packet > dropping, but complete blackout/packet drop in some way... > > Thanks, > > -- > > Andrija Panić > > > > dag.sonst...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > -- Andrija Panić