Re: [Linux-HA] cib_replace failed?

Robinson, Eric Tue, 13 Nov 2012 08:34:42 -0800

bump.

Could someone please review the logs in the links below and tell me what the 
heck is going on with this cluster? I've never encountered anything like this 
before. Basically, corosync thinks the cluster is healthy but Pacemaker won't 
elect a DC!


--

Hi Andrew,

> would love to see the logs from ha09b

Below are links to a clean set of logs from nodes ha09a and ha09b. The 
procedure I followed to collect the logs was:

1. Ensure pacemakerd and corosync are stopped on both nodes.
2. Remove corosync.log on both nodes.
3. Start corosync on ha09a.
4. Wait approx. 15 seconds.
5. Start corosync on ha09b.
6. Wait approx 15 seconds.
7. Use 'corosync-cfgtool -s' to verify rings active and without faults on both 
nodes.
8. Use 'corosync-objctl|grep member' on both nodes to verify cluster 
membership. (The nodeids look good to me.)

[root@ha09a log]# corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.973777088.ip=r(0) ip(192.168.10.58) r(1) 
ip(198.51.100.58)
runtime.totem.pg.mrp.srp.members.973777088.join_count=1
runtime.totem.pg.mrp.srp.members.973777088.status=joined
runtime.totem.pg.mrp.srp.members.990554304.ip=r(0) ip(192.168.10.59) r(1) 
ip(198.51.100.59)
runtime.totem.pg.mrp.srp.members.990554304.join_count=1
runtime.totem.pg.mrp.srp.members.990554304.status=joined

[root@ha09b log]# corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.990554304.ip=r(0) ip(192.168.10.59) r(1) 
ip(198.51.100.59)
runtime.totem.pg.mrp.srp.members.990554304.join_count=1
runtime.totem.pg.mrp.srp.members.990554304.status=joined
runtime.totem.pg.mrp.srp.members.973777088.ip=r(0) ip(192.168.10.58) r(1) 
ip(198.51.100.58)
runtime.totem.pg.mrp.srp.members.973777088.join_count=1
runtime.totem.pg.mrp.srp.members.973777088.status=joined

9. Start pacemaker on ha09a
10. Wait approx. 15 seconds.
11. Start pacemaker on ha09b.
12. Wait approx 15 seconds.

At this point, the corosync.log file on ha09a was already about 90MB in size 
and ha09b's was about 70MB. crm_mon showed that no DC had been elected.

I tried to stop pacemaker, but it just does the following on both nodes and 
hangs forever:

[root@ha09a log]# service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate:          [  OK  ]
Waiting for cluster services to unload:...............

So I had to reboot.

Here are the logs from both nodes.

www.psmnv.com/downloads/corosync_ha09a.log
www.psmnv.com/downloads/corosync_ha09b.log


--
Eric Robinson
 














Disclaimer - November 13, 2012 
This email and any files transmitted with it are confidential and intended 
solely for General Linux-HA mailing list. If you are not the named addressee 
you should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physicians' Managed Care or Physician Select Management. 
Warning: Although Physicians' Managed Care or Physician Select Management has 
taken reasonable precautions to ensure no viruses are present in this email, 
the company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] cib_replace failed?

Reply via email to