Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-22 Thread Brian J. Murrell (brian)
On Mon, 2014-10-13 at 12:51 +1100, Andrew Beekhof wrote: > > Even the same address can be a problem. That brief window where things were > getting renewed can screw up corosync. But as I proved, there was no renewal at all during the period of this entire pacemaker run, so the use of DHCP here i

Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-10 Thread Brian J. Murrell (brian)
On Wed, 2014-10-08 at 12:39 +1100, Andrew Beekhof wrote: > On 8 Oct 2014, at 2:09 am, Brian J. Murrell (brian) > wrote: > > > Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes "node5" > > and "node6" I saw an "unknown" third nod

[Pacemaker] unknown third node added to a 2 node cluster?

2014-10-07 Thread Brian J. Murrell (brian)
Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes "node5" and "node6" I saw an "unknown" third node being added to the cluster, but only on node5: Sep 18 22:52:16 node5 corosync[17321]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 12: memb=2, new=0, lost=

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Thu, 2014-02-06 at 10:42 -0500, Brian J. Murrell (brian) wrote: > On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: > > What version of pacemaker? > > Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. Doh! Somebody did a test run that had not been

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: > What version of pacemaker? Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. > On 10 Dec 2013, at 4:40 am, Brian J. Murrell > wrote: > I didn't seem to get a response to any of the below questions. I was hoping t

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-21 Thread Brian J. Murrell (brian)
On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote: > > What crm_mon are you looking at? > I see stuff like: > > virt-fencing (stonith:fence_xvm):Started rhos4-node3 > Resource Group: mysql-group > mysql-vip(ocf::heartbeat:IPaddr2): Started rhos4-node3 > mysql

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote: > > I know, I was giving you another example of when the cib is not completely > up-to-date with reality. Yeah, I understood that. I was just countering with why that example is actually more acceptable. > It may very well be partially s

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: > > Consider any long running action, such as starting a database. > We do not update the CIB until after actions have completed, so there can and > will be times when the status section is out of date to one degree or another. But that is

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote: > > > On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: > >> > >> The local cib hasn't caught up yet by the looks of it. I should have asked in my previous message: is this entirely an artifact of having just restarted or are there

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: > > The local cib hasn't caught up yet by the looks of it. Should crm_resource actually be [mis-]reporting as if it were knowledgeable when it's not though? IOW is this expected behaviour or should it be considered a bug? Should I open a

[Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
Hi, I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output of "crm_resource -L" is not trust-able, shortly after a node is booted. Here is the output from crm_resource -L on one of the nodes in a two node cluster (the one that was not rebooted): st-fencing (stonith:fence_foo

[Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-06 Thread Brian J. Murrell (brian)
I seem to have another instance where pacemaker fails to exit at the end of a shutdown. Here's the log from the start of the "service pacemaker stop": Dec 3 13:00:39 wtm-60vm8 crmd[14076]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCES

[Pacemaker] prevent starting resources on failed node

2013-12-06 Thread Brian J. Murrell (brian)
[ Hopefully this doesn't cause a duplicate post but my first attempt returned an error. ] Using pacemaker 1.1.10 (but I think this issue is more general than that release), I want to enforce a policy that once a node fails, no resources can be started/run on it until the user permits it. I have b