So, I have an 18 node cluster here (so a small haystack, indeed, but still a haystack in which to try to find a needle) where a certain set of (yet unknown, figuring that out is part of this process) operations are pooching pacemaker. The symptom is that on one or more nodes I get the following kinds of errors:
# cibadmin -Q Call cib_query failed (-41): Remote node did not respond along with similar things in the log: Jun 26 19:51:38 iu-18 crmd: [19119]: WARN: cib_rsc_callback: Resource update 7 failed: (rc=-41) Remote node did not respond Jun 26 19:51:38 iu-18 crmd: [19119]: WARN: cib_rsc_callback: Resource update 8 failed: (rc=-41) Remote node did not respond Jun 26 19:51:38 iu-18 crmd: [19119]: WARN: cib_rsc_callback: Resource update 9 failed: (rc=-41) Remote node did not respond Jun 26 19:51:38 iu-18 crmd: [19119]: WARN: cib_rsc_callback: Resource update 10 failed: (rc=-41) Remote node did not respond Jun 26 19:51:39 iu-18 crmd: [19119]: WARN: cib_rsc_callback: Resource update 11 failed: (rc=-41) Remote node did not respond Clearly some node in the cluster has a problem, but nothing in any of these messages is helping me figure out which one it is. Any hints on how I figure out which node this "iu-18" node is having problems communicating with? Cheers, b.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org