>>> Tim Serong <[email protected]> schrieb am 06.02.2013 um 03:50 in Nachricht
<[email protected]>:
> On 02/05/2013 10:31 PM, Lars Marowsky-Bree wrote:
> > On 2013-02-05T11:36:30, Ulrich Windl <[email protected]> 
> > wrote:
> > 
> > This looks like a support incident to me. Hard to diagnose without full
> > logs.
> > 
> >> Let me add: I'm not completely sure, but a side-effect of this messages 
> seems to be that resources (being cleaned up) that are running (e.g. Xen VMs) 
> are considered "stopped". If the CRM tried to start the VM elsewhere, data 
> corruption or other bad effects are likely...
> >>
> >> So I wonder: I thought that cleaning up a resource just resets the 
> failed-count for the nodes where the resource couldn't start before. Does it 
> (should it?) really clean the "running" status?
> > 
> > This part is normal. Cleanup removes the resources state from the
> > cluster/LRM completely (this includes the failure counts), which is then
> > reprobed. 
> > 
> > This does not cause concurrency violations. Even though it is true that
> > the resource shows up as "not running" briefly in crm_mon/hawk.
> > 
> > Perhaps a new state "not probed" would be useful, since the
> > probe_complete attribute is available in the CIB? Cc'ing Tim for his
> > opinion.
> 
> Good point.  Even if it's generally only a brief window where resources
> are shown as stopped after cleanup (even though they're never actually
> stopped), that could be confusing.  In Hawk's case, the status display

Hi!

Let me remark that the "brief window" is long enough for a user to start 
crm_mon after cleanup a few times and still see "Stopped" resources. Maybe it's 
more obvious for non-trivial setups.

In my case I have: 2 nodes, 85 primitives, 7 groups, 6 clones and 13 constraints

> is implemented such that resources with no LRM state are reported as
> Stopped, where strictly they should probably show as Unknown (or, as you
> say, "Not Probed").  I'll make a note to do something about that.

I also wondered (without hard evidence of a problem) whether the cluster can 
handle interleaved sequences of resource cleanups (using individual resource 
anmes and nod enames) and the probes induced by cleanup (as you stated). I ha 
dthe impression that the cluster is more stable if you wait significant time 
between individual cleanups.

> 
> I'm not sure why crm_mon seems to show non-probed resources as Stopped
> (it's been some time since I went digging through the pengine/unpack code).

Regards,
Ulrich

> 
> Regards,
> 
> Tim


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to