Greetings! I actually messaged about this several months(?) ago, though you articulated it better than I.
I run a 2-node HA VM Cluster with KVM/pacemaker on top of DRBD 8.4 very comparable to your hardware.and have experienced similar symptoms during backup procedures. When it's really bad, one node will fence the other because the remote disk becomes unresponsive past the DRBD timeout threshold (auto calculated around 42 seconds). My only work around has been to keep all VMs on a single node at a time and manually move all nodes periodically- this setup tolerates the I/O spike much better. However, we don't get the performance benefit of having both nodes active, not to mention the added administrative overhead. -Chris
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
