On 09/14/2013 07:18 AM, Lars Marowsky-Bree wrote: > On 2013-09-14T00:28:30, Tom Parker <[email protected]> wrote: > >> Does anyone know of a good way to prevent pacemaker from declaring a vm >> dead if it's rebooted from inside the vm. It seems to be detecting the >> vm as stopped for the brief moment between shutting down and starting >> up. > Hrm. Good question. Because to the monitor, it really looks as if the VM > is temporarily gone, and it doesn't know ... Perhaps we need to keep > looking for it for a few seconds. > > Can you kindly file a bug report here so it doesn't get lost > https://github.com/ClusterLabs/resource-agents/issues ? Submitted (Issue *#308)* >> Often this causes the cluster to have two copies of the same vm if the >> locks are not set properly (which I have found to be unreliable) one >> that is managed and one that is abandonded. > *This* however is really, really worrisome and sounds like data > corruption. How is this happening? It definitely leads to data corruption and I think has to do with the way that the locking is not working properly on my lvm partitions. It seems to mostly happen on clusters where I am using lvm slices on an MSA as shared storage (they don't seem to lock at the lv level) and the placement-strategy is utilization. If Xen reboots and the cluster declares the vm as dead it seems to try to start it on another node that has more resources instead of the node where it was running. It doesn't happen consistently enough for me to detect a pattern and seems to never happen on my QA system where I can actually cause corruption without anyone getting mad. If I can isolate how it happens I will file a bug. > > > The work-around right now is to put the VM resource into maintenance > mode for the reboot, or to reboot it via stop/start of the cluster > manager. > > > Regards, > Lars >
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
