> > 99% of the time, the resource will stop correctly, it is just on a few > > occasions that I see an error like this. > > > > Is this a known problem, or can I generate extra logging to try help > > debug? > > Never heard of it. That sounds quite serious. Yes, extra logging > would be helpful. How often did that happen? Which releases do > you run?
I reported it to this list - without any reply. Then also filled a bug report: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2458 Also without a reply so far. I looked into lrmd code and it seems to only know what it passed as xml to it, so unlikely to be a cluster-glue issue. Now it would be much easier to debug, if lrmd would know about all resources and would know about required parameters. It then could fail immediately without calling the RA. But that is design problem. IMHO, the issue was introced in pacemaker between 1.0.7 and 1.0.9, but I do not the time to track it further down. For now we simply continue to use 1.0.7 (as I reported to the list before, 1.0.8 randomly fails to start resources, as we typically have above 30, 60 or even 120 resources, we run then run into random issues all the time...). Cheers, Bernd -- Bernd Schubert DataDirect Networks _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker