Hey guys,
A few days ago we noticed that some of our VPSes lost network connection
after a live migration using QEMU/KVM. We started investigating and
noticed that the problem was at least present for guests running
FreeBSD, Debian 7 (Linux 3.2.81-1) and some flavors of CentOS. At least
Debian 8 (Linux 3.16.7-ckt25-2) was not affected. Everything is using
the virtio drivers. We also noted that the problem was not present in
QEMU 2.4.0, but was present in QEMU 2.6.0. The configs between both
versions of QEMU are identical. We use libvirt 1.3.5 to manage QEMU/KVM
(same version on both qemu versions).
After some false starts we managed to reliably reproduce the issue with
a Debian 7 VM. What we noticed was that Debian 8 send out gratuitous ARP
replies. Debian 7 did not appear to send those. After some further
digging we found out that on QEMU 2.4.0 Debian 7 actually send
gratuitous RARP packets after a live migration instead of normal ARP
packets and that those where not present when migrating on QEMU 2.6.0.
We then started narrowing down the responsible commits using git bisect
and eventually ended up with the last good commit being
d62241eb6da9bd2517f07b3219ba4208b90b4e0d and the last bad commit we
tested was 7ef7bc8586fb0d41742a896b532c7afa2bbb7f84. This is a range of
6 commits introducing the NetFilter functionality. We couldn't really
pinpoint it more accurately so we decided to email you guys, to see if
you can figure out whats up.
Regards,
Robin Geuze
TransIP BV