Hello,
Our cluster was working OK on corosync stack, with corosync 2.3.0 and
pacemaker 1.1.8.
After upgrading (full versions and configs below), we began to have
problems with node names.
It's a two node cluster, with node names "turifel" (DC) and "selavi".
When selavi joins cluster, we have this warning at selavi log:
-----
Jun 27 11:54:29 selavi attrd[11998]: notice: corosync_node_name:
Unable to get node name for nodeid 168385827
Jun 27 11:54:29 selavi attrd[11998]: notice: get_node_name: Defaulting
to uname -n for the local corosync node name
-----
This is ok, and also happenned with version 1.1.8.
At corosync level, all seems ok:
----
Jun 27 11:51:18 turifel corosync[6725]: [TOTEM ] A processor joined or
left the membership and a new membership (10.9.93.35:1184) was formed.
Jun 27 11:51:18 turifel corosync[6725]: [QUORUM] Members[2]: 168385827
168385835
Jun 27 11:51:18 turifel corosync[6725]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 27 11:51:18 turifel crmd[19526]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node selavi[168385827] - state is now member
(was lost)
-------
But when starting pacemaker on selavi (the new node), turifel log shows
this:
----
Jun 27 11:54:28 turifel crmd[19526]: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jun 27 11:54:28 turifel crmd[19526]: warning: crm_get_peer: Node
'selavi' and 'selavi' share the same cluster nodeid: 168385827
Jun 27 11:54:28 turifel crmd[19526]: warning: crmd_cs_dispatch:
Recieving messages from a node we think is dead: selavi[0]
Jun 27 11:54:29 turifel crmd[19526]: warning: crm_get_peer: Node
'selavi' and 'selavi' share the same cluster nodeid: 168385827
Jun 27 11:54:29 turifel crmd[19526]: warning: do_state_transition: Only
1 of 2 cluster nodes are eligible to run resources - continue 0
Jun 27 11:54:29 turifel attrd[19524]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
----
And selavi remains on pending state. Some times turifel (DC) fences
selavi, but other times remains pending forever.
On turifel node, all resources gives warnings like this one:
warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is
unrunnable (pending)
On both nodes, uname -n and crm_node -n gives correct node names (selavi
and turifel respectively)
¿Do you think it's a configuration problem?
Below I give information about versions and configurations.
Best regards,
Bernardo.
-----
Versions (git/hg compiled versions):
corosync: 2.3.0.66-615d
pacemaker: 1.1.9-61e4b8f
cluster-glue: 1.0.11
libqb: 0.14.4.43-bb4c3
resource-agents: 3.9.5.98-3b051
crmsh: 1.2.5
Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant
here.
--------
Output of pacemaker configuration:
./configure --prefix=/opt/ha --without-cman \
--without-heartbeat --with-corosync \
--enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso
pacemaker configuration:
Version = 1.1.9 (Build: 61e4b8f)
Features = generated-manpages ascii-docs ncurses
libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp
libesmtp
Prefix = /opt/ha
Executables = /opt/ha/sbin
Man pages = /opt/ha/share/man
Libraries = /opt/ha/lib
Header files = /opt/ha/include
Arch-independent files = /opt/ha/share
State information = /opt/ha/var
System configuration = /opt/ha/etc
Corosync Plugins = /opt/ha/lib
Use system LTDL = yes
HA group name = haclient
HA user name = hacluster
CFLAGS = -I/opt/ha/include -I/opt/ha/include
-I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include
-ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return
-Wbad-function-cast -Wcast-align -Wdeclaration-after-statement
-Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security
-Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations
-Wnested-externs -Wno-long-long -Wno-strict-aliasing
-Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes
-Wwrite-strings
Libraries = -lgnutls -lcorosync_common -lplumb -lpils
-lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl -lglib-2.0 -lltdl
-L/opt/ha/lib -lqb -ldl -lrt -lpthread
Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread
-L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap
-L/opt/ha/lib -lquorum
----
Corosync config:
totem {
version: 2
crypto_cipher: none
crypto_hash: none
cluster_name: fiestaha
interface {
ringnumber: 0
ttl: 1
bindnetaddr: 10.9.93.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: local7
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
wait_for_all: 0
}
--
APSL
*Bernardo Cabezas Serra*
*Responsable Sistemas*
Camí Vell de Bunyola 37, esc. A, local 7
07009 Polígono de Son Castelló, Palma
Mail: [email protected]
Skype: bernat.cabezas
Tel: 971439771
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org