(12.12.05 02:02), David Vossel wrote:
----- Original Message -----
From: "Kazunori INOUE" <inouek...@intellilink.co.jp>
To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org>
Sent: Monday, December 3, 2012 11:41:56 PM
Subject: Re: [Pacemaker] node status does not change even if pacemakerd dies
(12.12.03 20:24), Andrew Beekhof wrote:
On Mon, Dec 3, 2012 at 8:15 PM, Kazunori INOUE
<inouek...@intellilink.co.jp> wrote:
(12.11.30 23:52), David Vossel wrote:
----- Original Message -----
From: "Kazunori INOUE" <inouek...@intellilink.co.jp>
To: "pacemaker@oss" <pacemaker@oss.clusterlabs.org>
Sent: Friday, November 30, 2012 2:38:50 AM
Subject: [Pacemaker] node status does not change even if
pacemakerd dies
Hi,
I am testing the latest version.
- ClusterLabs/pacemaker 9c13d14640(Nov 27, 2012)
- corosync 92e0f9c7bb(Nov 07, 2012)
- libqb 30a7871646(Nov 29, 2012)
Although I killed pacemakerd, node status did not change.
[dev1 ~]$ pkill -9 pacemakerd
[dev1 ~]$ crm_mon
:
Stack: corosync
Current DC: dev2 (2472913088) - partition with quorum
Version: 1.1.8-9c13d14
2 Nodes configured, unknown expected votes
0 Resources configured.
Online: [ dev1 dev2 ]
[dev1 ~]$ ps -ef|egrep 'corosync|pacemaker'
root 11990 1 1 16:05 ? 00:00:00 corosync
496 12010 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/cib
root 12011 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/stonithd
root 12012 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/lrmd
496 12013 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/attrd
496 12014 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/pengine
496 12015 1 0 16:05 ? 00:00:00
/usr/libexec/pacemaker/crmd
We want the node status to change to
OFFLINE(stonith-enabled=false),
UNCLEAN(stonith-enabled=true).
That is, we want the function of this deleted code.
https://github.com/ClusterLabs/pacemaker/commit/dfdfb6c9087e644cb898143e198b240eb9a928b4
How are you launching pacemakerd? The systemd service script
relaunches
pacemakerd on failure and pacemakerd has the ability to attach to
all the
old processes if they are still around as if nothing happened.
-- Vossel
Hi David,
We are using RHEL6 and use it for a while after this.
Therefore, I start it by the following commands.
$ /etc/init.d/pacemakerd start
or
$ service pacemaker start
Ok.
Are you using the pacemaker plugin?
When using cman or corosync 2.0, pacemakerd isn't strictly needed
for
normal operation.
Its only there to shutdown and/or respawn failed components.
We are using corosync 2.1,
so service does not stop normally after pacemakerd died.
$ pkill -9 pacemakerd
$ service pacemaker stop
$ echo $?
0
$ ps -ef|egrep 'corosync|pacemaker'
root 3807 1 0 13:10 ? 00:00:00 corosync
496 3827 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/cib
root 3828 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/stonithd
root 3829 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/lrmd
496 3830 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/attrd
496 3831 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/pengine
496 3832 1 0 13:10 ? 00:00:00
/usr/libexec/pacemaker/crmd
Ah yes, that is a problem.
Having pacemaker still running when the init script says it is down... that is
bad. Perhaps we should just make the init script smart enough to check to make
sure all the pacemaker components are down after pacemakerd is down.
The argument of whether or not the failure of pacemakerd is something that the
cluster should be alerted to is something i'm not sure about. With the
corosync 2.0 stack, pacemakerd really doesn't do anything except launch
processes/relaunch processes. A cluster can be completely functional without a
pacemakerd instance running anywhere. If any of the actual pacemaker
components on a node fail, the logic that causes that node to get fenced has
nothing to do with pacemakerd.
-- Vossel
Hi,
I think that "relaunch processes" of pacemakerd is a very useful function,
so I want to avoid management of a resource in the node in which pacemakerd
does not exist.
Though the best solution is to relaunch pacemakerd, if it is difficult,
I think that a shortcut method is to make a node unclean.
And now, I tried Upstart a little bit.
1) started the corosync and pacemaker.
$ cat /etc/init/pacemaker.conf
respawn
script
[ -f /etc/sysconfig/pacemaker ] && {
. /etc/sysconfig/pacemaker
}
exec /usr/sbin/pacemakerd
end script
$ service co start
Starting Corosync Cluster Engine (corosync): [ OK ]
$ initctl start pacemaker
pacemaker start/running, process 4702
$ ps -ef|egrep 'corosync|pacemaker'
root 4695 1 0 17:21 ? 00:00:00 corosync
root 4702 1 0 17:21 ? 00:00:00 /usr/sbin/pacemakerd
496 4703 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/cib
root 4704 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/stonithd
root 4705 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/lrmd
496 4706 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/attrd
496 4707 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/pengine
496 4708 4702 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/crmd
2) killed pacemakerd.
$ pkill -9 pacemakerd
$ ps -ef|egrep 'corosync|pacemaker'
root 4695 1 0 17:21 ? 00:00:01 corosync
496 4703 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/cib
root 4704 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/stonithd
root 4705 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/lrmd
496 4706 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/attrd
496 4707 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/pengine
496 4708 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/crmd
root 4760 1 1 17:24 ? 00:00:00 /usr/sbin/pacemakerd
3) then I stopped pacemakerd. however, some processes did not stop.
$ initctl stop pacemaker
pacemaker stop/waiting
$ ps -ef|egrep 'corosync|pacemaker'
root 4695 1 0 17:21 ? 00:00:01 corosync
496 4703 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/cib
root 4704 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/stonithd
root 4705 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/lrmd
496 4706 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/attrd
496 4707 1 0 17:21 ? 00:00:00 /usr/libexec/pacemaker/pengine
Best Regards,
Kazunori INOUE
This isnt the case when the plugin is in use though, but then I'd
also
have expected most of the processes to die also.
Since node status will also change if such a result is brought,
we desire to become so.
----
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.3 (Santiago)
$ ./configure --sysconfdir=/etc --localstatedir=/var
--without-cman
--without-heartbeat
-snip-
pacemaker configuration:
Version = 1.1.8 (Build: 9c13d14)
Features = generated-manpages agent-manpages
ascii-docs
publican-docs ncurses libqb-logging libqb-ipc lha-fencing
corosync-native
snmp
$ cat config.log
-snip-
6000 | #define BUILD_VERSION "9c13d14"
6001 | /* end confdefs.h. */
6002 | #include <gio/gio.h>
6003 |
6004 | int
6005 | main ()
6006 | {
6007 | if (sizeof (GDBusProxy))
6008 | return 0;
6009 | ;
6010 | return 0;
6011 | }
6012 configure:32411: result: no
6013 configure:32417: WARNING: Unable to support systemd/upstart.
You need
to use glib >= 2.26
-snip-
6286 | #define BUILD_VERSION "9c13d14"
6287 | #define SUPPORT_UPSTART 0
6288 | #define SUPPORT_SYSTEMD 0
Best Regards,
Kazunori INOUE
related bugzilla:
http://bugs.clusterlabs.org/show_bug.cgi?id=5064
Best Regards,
Kazunori INOUE
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org