Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-08-14 Thread Andrew Beekhof
On 05/07/2012, at 2:51 AM, Brian J. Murrell wrote: > On 12-07-04 04:27 AM, Andreas Kurz wrote: >> >> beside increasing the batch limit to a higher value ... did you also >> tune corosync totem timings? > > Not yet. > > But a closer look at the logs reveals a bunch of these: > > Jun 28 14:56:

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-04 Thread Brian J. Murrell
On 12-07-04 04:27 AM, Andreas Kurz wrote: > > beside increasing the batch limit to a higher value ... did you also > tune corosync totem timings? Not yet. But a closer look at the logs reveals a bunch of these: Jun 28 14:56:56 node-2 corosync[30497]: [pcmk ] ERROR: send_cluster_msg_raw: Chi

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-04 Thread Brian J. Murrell
On 12-07-04 02:12 AM, Andrew Beekhof wrote: > On Wed, Jul 4, 2012 at 10:06 AM, Brian J. Murrell > wrote: >> >> Just because I reduced the number of nodes doesn't mean that I reduced >> the parallelism any. > > Yes. You did. You reduced the number of "check what state the > resource is on every

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-04 Thread Andreas Kurz
On 07/04/2012 12:36 AM, Brian J. Murrell wrote: > On 12-07-03 06:17 PM, Andrew Beekhof wrote: >> >> Even adding passive nodes multiplies the number of probe operations >> that need to be performed and loaded into the cib. > > So it seems. I just would have not thought they be such a load since >

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Andrew Beekhof
On Wed, Jul 4, 2012 at 10:06 AM, Brian J. Murrell wrote: > On 12-07-03 04:26 PM, David Vossel wrote: >> >> This is not a definite. Perhaps you are experiencing this given the >> pacemaker version you are running > > Yes, that is absolutely possible and it certainly has been under > consideration

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-07-03 04:26 PM, David Vossel wrote: > > This is not a definite. Perhaps you are experiencing this given the > pacemaker version you are running Yes, that is absolutely possible and it certainly has been under consideration throughout this process. I did also recognize however, that I am

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-07-03 06:17 PM, Andrew Beekhof wrote: > > Even adding passive nodes multiplies the number of probe operations > that need to be performed and loaded into the cib. So it seems. I just would have not thought they be such a load since from a simplistic perspective, since they are not trying t

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Andrew Beekhof
On Wed, Jul 4, 2012 at 5:15 AM, Brian J. Murrell wrote: > Thoughts? Even adding passive nodes multiplies the number of probe operations that need to be performed and loaded into the cib. Did you try any of the settings I suggested? ___ Pacemaker mailin

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread David Vossel
- Original Message - > From: "Brian J. Murrell" > To: pacema...@clusterlabs.org > Sent: Tuesday, July 3, 2012 2:15:09 PM > Subject: Re: [Pacemaker] Call cib_query failed (-41): Remote node did not > respond > > On 12-06-27 11:30 PM, Andrew Beekhof w

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-06-27 11:30 PM, Andrew Beekhof wrote: > > The updates from you aren't the problem. Its the number of resource > operations (that need to be stored in the CIB) that result from your > changes that might be causing the problem. Just to follow this up for anyone currently following or anyone

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-27 Thread Andrew Beekhof
On Thu, Jun 28, 2012 at 1:29 PM, Andrew Beekhof wrote: > On Wed, Jun 27, 2012 at 11:30 PM, Brian J. Murrell > wrote: >> On 12-06-26 09:54 PM, Andrew Beekhof wrote: >>> >>> The DC, possibly you didn't have one at that moment in time. >> >> It was the DC in fact.  I restarted corosync on that node

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-27 Thread Andrew Beekhof
On Wed, Jun 27, 2012 at 11:30 PM, Brian J. Murrell wrote: > On 12-06-26 09:54 PM, Andrew Beekhof wrote: >> >> The DC, possibly you didn't have one at that moment in time. > > It was the DC in fact.  I restarted corosync on that node and the > timeouts went away.  But note I "re"started, not starte

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-27 Thread Brian J. Murrell
On 12-06-26 09:54 PM, Andrew Beekhof wrote: > > The DC, possibly you didn't have one at that moment in time. It was the DC in fact. I restarted corosync on that node and the timeouts went away. But note I "re"started, not started. It was running at the time, just not properly, apparently. > W

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-26 Thread Andrew Beekhof
On Wed, Jun 27, 2012 at 6:45 AM, Brian J. Murrell wrote: > So, I have an 18 node cluster here (so a small haystack, indeed, but > still a haystack in which to try to find a needle) where a certain > set of (yet unknown, figuring that out is part of this process) > operations are pooching pacemaker

[Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-26 Thread Brian J. Murrell
So, I have an 18 node cluster here (so a small haystack, indeed, but still a haystack in which to try to find a needle) where a certain set of (yet unknown, figuring that out is part of this process) operations are pooching pacemaker. The symptom is that on one or more nodes I get the following ki