Re: [Pacemaker] Error: cluster is not currently running on this node

Miha Sun, 17 Aug 2014 22:14:53 -0700

Hi Emmanuel,

this is my config:



Pacemaker Nodes:
 sip1 sip2

Resources:
 Master: ms_drbd_mysql

Meta Attrs: master-max=1 master-node-max=1 clone-max=2clone-node-max=1 notify=true

  Resource: p_drbd_mysql (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=clusterdb_res
   Operations: monitor interval=29s role=Master (p_drbd_mysql-monitor-29s)
               monitor interval=31s role=Slave (p_drbd_mysql-monitor-31s)
 Group: g_mysql
  Resource: p_fs_mysql (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/drbd0 directory=/var/lib/mysql_drbd fstype=ext4
   Meta Attrs: target-role=Started
  Resource: p_ip_mysql (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=XXX.XXX.XXX.XXX cidr_netmask=24 nic=eth2
  Resource: p_mysql (class=ocf provider=heartbeat type=mysql)

Attributes: datadir=/var/lib/mysql_drbd/data/ user=root group=rootconfig=/var/lib/mysql_drbd/my.cnf pid=/var/run/mysqld/mysqld.pidsocket=/var/lib/mysql/mysql.sock binary=/usr/bin/mysqld_safeadditional_parameters="--bind-address=212.13.249.55 --user=root"

   Meta Attrs: target-role=Started
   Operations: start interval=0 timeout=120s (p_mysql-start-0)
               stop interval=0 timeout=120s (p_mysql-stop-0)
               monitor interval=20s timeout=30s (p_mysql-monitor-20s)
 Clone: cl_ping
  Meta Attrs: interleave=true
  Resource: p_ping (class=ocf provider=pacemaker type=ping)
   Attributes: name=ping multiplier=1000 host_list=XXX.XXX.XXX.XXXX
   Operations: monitor interval=15s timeout=60s (p_ping-monitor-15s)
               start interval=0s timeout=60s (p_ping-start-0s)
               stop interval=0s (p_ping-stop-0s)
 Resource: opensips (class=lsb type=opensips)
  Meta Attrs: target-role=Started
  Operations: start interval=0 timeout=120 (opensips-start-0)
              stop interval=0 timeout=120 (opensips-stop-0)

Stonith Devices:
 Resource: fence_sip1 (class=stonith type=fence_bladecenter_snmp)

Attributes: action=off ipaddr=172.30.0.2 port=8 community=testlogin=snmp8 passwd=soft1234

  Meta Attrs: target-role=Started
 Resource: fence_sip2 (class=stonith type=fence_bladecenter_snmp)

Attributes: action=off ipaddr=172.30.0.2 port=9 community=test1login=snmp8 passwd=soft1234

  Meta Attrs: target-role=Started
Fencing Levels:

Location Constraints:
  Resource: ms_drbd_mysql
    Constraint: l_drbd_master_on_ping

Rule: score=-INFINITY role=Master boolean-op=or(id:l_drbd_master_on_ping-rule)

        Expression: not_defined ping (id:l_drbd_master_on_ping-expression)

Expression: ping lte 0 type=number(id:l_drbd_master_on_ping-expression-0)

Ordering Constraints:

promote ms_drbd_mysql then start g_mysql (INFINITY)(id:o_drbd_before_mysql)

  g_mysql then start opensips (INFINITY) (id:opensips_after_mysql)
Colocation Constraints:

g_mysql with ms_drbd_mysql (INFINITY) (with-rsc-role:Master)(id:c_mysql_on_drbd)

  opensips with g_mysql (INFINITY) (id:c_opensips_on_mysql)

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.10-14.el6-368c726
 no-quorum-policy: ignore
 stonith-enabled: true
Node Attributes:
 sip1: standby=off
 sip2: standby=off


br
miha

Dne 8/14/2014 3:05 PM, piše emmanuel segura:

ncomplete=10, Source=/var/lib/pacemaker/pengine/pe-warn-7.bz2): Stopped
Jul 03 14:10:51 [2701] sip2       crmd:   notice:
too_many_st_failures:         No devices found in cluster to fence
sip1, giving up

Jul 03 14:10:54 [2697] sip2 stonith-ng:     info: stonith_command:
  Processed st_query reply from sip2: OK (0)
Jul 03 14:10:54 [2697] sip2 stonith-ng:    error: remote_op_done:
  Operation reboot of sip1 by sip2 for
stonith_admin.cman.28299@sip2.94474607: No such device

Jul 03 14:10:54 [2697] sip2 stonith-ng:     info: stonith_command:
  Processed st_notify reply from sip2: OK (0)
Jul 03 14:10:54 [2701] sip2       crmd:   notice:
tengine_stonith_notify:       Peer sip1 was not terminated (reboot) by
sip2 for sip2: No such device
(ref=94474607-8cd2-410d-bbf7-5bc7df614a50) by client
stonith_admin.cman.28299

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Sorry for the short answer, have you tested your cluster fencing ? can
you show your cluster.conf xml?

2014-08-14 14:44 GMT+02:00 Miha <m...@softnet.si>:

emmanuel,

tnx. But how to know why fancing stop working?

br
miha

Dne 8/14/2014 2:35 PM, piše emmanuel segura:

Node sip2: UNCLEAN (offline) is unclean because the cluster fencing
failed to complete the operation

2014-08-14 14:13 GMT+02:00 Miha <m...@softnet.si>:

hi.

another thing.

On node I pcs is running:
[root@sip1 ~]# pcs status
Cluster name: sipproxy
Last updated: Thu Aug 14 14:13:37 2014
Last change: Sat Feb  1 20:10:48 2014 via crm_attribute on sip1
Stack: cman
Current DC: sip1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured
10 Resources configured


Node sip2: UNCLEAN (offline)
Online: [ sip1 ]

Full list of resources:

   Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
       Masters: [ sip2 ]
       Slaves: [ sip1 ]
   Resource Group: g_mysql
       p_fs_mysql (ocf::heartbeat:Filesystem):    Started sip2
       p_ip_mysql (ocf::heartbeat:IPaddr2):       Started sip2
       p_mysql    (ocf::heartbeat:mysql): Started sip2
   Clone Set: cl_ping [p_ping]
       Started: [ sip1 sip2 ]
   opensips       (lsb:opensips): Stopped
   fence_sip1     (stonith:fence_bladecenter_snmp):       Started sip2
   fence_sip2     (stonith:fence_bladecenter_snmp):       Started sip2


[root@sip1 ~]#





Dne 8/14/2014 2:12 PM, piše Miha:

Hi emmanuel,

i think so, what is the best way to check?

Sorry for my noob question, I have confiured this 6 mouths ago and
everything was working fine till now. Now I need to find out what realy
heppend beffor I do something stupid.



tnx

Dne 8/14/2014 1:58 PM, piše emmanuel segura:

are you sure your cluster fencing is working?

2014-08-14 13:40 GMT+02:00 Miha <m...@softnet.si>:

Hi,

I noticed today that I am having some problem with cluster. I noticed
the
master server is offilne but still virutal ip is assigned to it and
all
services are running properly (for production).

If I do this I am getting this notifications:

[root@sip2 cluster]# pcs status
Error: cluster is not currently running on this node
[root@sip2 cluster]# /etc/init.d/corosync status
corosync dead but pid file exists
[root@sip2 cluster]# pcs status
Error: cluster is not currently running on this node
[root@sip2 cluster]#
[root@sip2 cluster]#
[root@sip2 cluster]# tailf fenced.log
Aug 14 13:34:25 fenced cman_get_cluster error -1 112


The main thing is what to do now? Do "pcs start" and hope for the best
or
what?

I have pasted log in pastebin: http://pastebin.com/SUp2GcmN

tnx!

miha

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Error: cluster is not currently running on this node

Reply via email to