>>> Tim Serong <[email protected]> schrieb am 06.02.2013 um 03:50 in Nachricht <[email protected]>: > On 02/05/2013 10:31 PM, Lars Marowsky-Bree wrote: > > On 2013-02-05T11:36:30, Ulrich Windl <[email protected]> > > wrote: > > > > This looks like a support incident to me. Hard to diagnose without full > > logs. > > > >> Let me add: I'm not completely sure, but a side-effect of this messages > seems to be that resources (being cleaned up) that are running (e.g. Xen VMs) > are considered "stopped". If the CRM tried to start the VM elsewhere, data > corruption or other bad effects are likely... > >> > >> So I wonder: I thought that cleaning up a resource just resets the > failed-count for the nodes where the resource couldn't start before. Does it > (should it?) really clean the "running" status? > > > > This part is normal. Cleanup removes the resources state from the > > cluster/LRM completely (this includes the failure counts), which is then > > reprobed. > > > > This does not cause concurrency violations. Even though it is true that > > the resource shows up as "not running" briefly in crm_mon/hawk. > > > > Perhaps a new state "not probed" would be useful, since the > > probe_complete attribute is available in the CIB? Cc'ing Tim for his > > opinion. > > Good point. Even if it's generally only a brief window where resources > are shown as stopped after cleanup (even though they're never actually > stopped), that could be confusing. In Hawk's case, the status display
Hi! Let me remark that the "brief window" is long enough for a user to start crm_mon after cleanup a few times and still see "Stopped" resources. Maybe it's more obvious for non-trivial setups. In my case I have: 2 nodes, 85 primitives, 7 groups, 6 clones and 13 constraints > is implemented such that resources with no LRM state are reported as > Stopped, where strictly they should probably show as Unknown (or, as you > say, "Not Probed"). I'll make a note to do something about that. I also wondered (without hard evidence of a problem) whether the cluster can handle interleaved sequences of resource cleanups (using individual resource anmes and nod enames) and the probes induced by cleanup (as you stated). I ha dthe impression that the cluster is more stable if you wait significant time between individual cleanups. > > I'm not sure why crm_mon seems to show non-probed resources as Stopped > (it's been some time since I went digging through the pengine/unpack code). Regards, Ulrich > > Regards, > > Tim _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
