pcs is another package you will need to install. On Oct 1, 2013 9:04 AM, "David Parker" <dpar...@utica.edu> wrote:
> Hello, > > Sorry for the delay in my reply. I've been doing a lot of > experimentation, but so far I've had no luck. > > Thanks for the suggestion, but it seems I'm not able to use CMAN. I'm > running Debian Wheezy with Corosync and Pacemaker installed via apt-get. > When I installed CMAN and set up a cluster.conf file, Pacemaker refused to > start and said that CMAN was not supported. When CMAN is not installed, > Pacemaker starts up fine, but I see these lines in the log: > > Sep 30 23:36:29 test-vm-1 crmd: [6941]: ERROR: init_quorum_connection: The > Corosync quorum API is not supported in this build > Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: ERROR: pcmk_child_exit: > Child process crmd exited (pid=6941, rc=100) > Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: WARN: pcmk_child_exit: > Pacemaker child process crmd no longer wishes to be respawned. Shutting > ourselves down. > > So, then I checked to see which plugins are supported: > > # pacemakerd -F > Pacemaker 1.1.7 (Build: ee0730e13d124c3d58f00016c3376a1de5323cff) > Supporting: generated-manpages agent-manpages ncurses heartbeat > corosync-plugin snmp libesmtp > > Am I correct in believing that this Pacemaker package has been compiled > without support for any quorum API? If so, does anyone know if there is a > Debian package which has the correct support? > > I also tried compiling LibQB, Corosync and Pacemaker from source via git, > following the instructions documented here: > > http://clusterlabs.org/wiki/SourceInstall > > I was hopeful that this would work, because as I understand it, Corosync > 2.x no longer uses CMAN. Everything compiled and started fine, but the > compiled version of Pacemaker did not include either the 'crm' or 'pcs' > commands. Do I need to install something else in order to get one of these? > > Any and all help is greatly appreciated! > > Thanks, > Dave > > > On Wed, Sep 25, 2013 at 6:08 AM, David Lang <da...@lang.hm> wrote: > >> the cluster is trying to reach a quarum (the majority of the nodes >> talking to each other) and that is never going to happen with only one >> node. so you have to disable this. >> >> try putting >> <cman two_node="1" expected_votes="1" transport="udpu"/> >> in your cluster.conf >> >> David Lang >> >> On Tue, 24 Sep 2013, David Parker wrote: >> >> Date: Tue, 24 Sep 2013 11:48:59 -0400 >>> From: David Parker <dpar...@utica.edu> >>> Reply-To: The Pacemaker cluster resource manager >>> <pacemaker@oss.clusterlabs.org**> >>> To: The Pacemaker cluster resource manager < >>> pacemaker@oss.clusterlabs.org**> >>> Subject: Re: [Pacemaker] Corosync won't recover when a node fails >>> >>> >>> I forgot to mention, OS is Debian Wheezy 64-bit, Corosync and Pacemaker >>> installed from packages via apt-get, and there are no local firewall >>> rules >>> in place: >>> >>> # iptables -L >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> >>> On Tue, Sep 24, 2013 at 11:41 AM, David Parker <dpar...@utica.edu> >>> wrote: >>> >>> Hello, >>>> >>>> I have a 2-node cluster using Corosync and Pacemaker, where the nodes >>>> are >>>> actually to VirtualBox VMs on the same physical machine. I have some >>>> resources set up in Pacemaker, and everything works fine if I move them >>>> in >>>> a controlled way with the "crm_resource -r <resource> --move --node >>>> <node>" >>>> command. >>>> >>>> However, when I hard-fail one of the nodes via the "poweroff" command in >>>> Virtual Box, which "pulls the plug" on the VM, the resources do not >>>> move, >>>> and I see the following output in the log on the remaining node: >>>> >>>> Sep 24 11:20:30 corosync [TOTEM ] The token was lost in the OPERATIONAL >>>> state. >>>> Sep 24 11:20:30 corosync [TOTEM ] A processor failed, forming new >>>> configuration. >>>> Sep 24 11:20:30 corosync [TOTEM ] entering GATHER state from 2. >>>> Sep 24 11:20:31 test-vm-2 lrmd: [2503]: debug: rsc:drbd_r0:0 monitor[31] >>>> (pid 8495) >>>> drbd[8495]: 2013/09/24_11:20:31 WARNING: This resource agent is >>>> deprecated and may be removed in a future release. See the man page for >>>> details. To suppress this warning, set the "ignore_deprecation" resource >>>> parameter to true. >>>> drbd[8495]: 2013/09/24_11:20:31 WARNING: This resource agent is >>>> deprecated and may be removed in a future release. See the man page for >>>> details. To suppress this warning, set the "ignore_deprecation" resource >>>> parameter to true. >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c >>>> /etc/drbd.conf role r0 >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Exit code 0 >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Command output: >>>> Secondary/Primary >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c >>>> /etc/drbd.conf cstate r0 >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Exit code 0 >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Command output: Connected >>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0 status: Secondary/Primary >>>> Secondary Primary Connected >>>> Sep 24 11:20:31 test-vm-2 lrmd: [2503]: info: operation monitor[31] on >>>> drbd_r0:0 for client 2506: pid 8495 exited with return code 0 >>>> Sep 24 11:20:32 corosync [TOTEM ] entering GATHER state from 0. >>>> Sep 24 11:20:34 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:34 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:36 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:36 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:38 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:38 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:40 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:40 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:40 corosync [TOTEM ] Totem is unable to form a cluster >>>> because of an operating system or network fault. The most common cause >>>> of >>>> this message is that the local firewall is configured improperly. >>>> Sep 24 11:20:43 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:43 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:43 corosync [TOTEM ] Totem is unable to form a cluster >>>> because of an operating system or network fault. The most common cause >>>> of >>>> this message is that the local firewall is configured improperly. >>>> Sep 24 11:20:45 corosync [TOTEM ] The consensus timeout expired. >>>> Sep 24 11:20:45 corosync [TOTEM ] entering GATHER state from 3. >>>> Sep 24 11:20:45 corosync [TOTEM ] Totem is unable to form a cluster >>>> because of an operating system or network fault. The most common cause >>>> of >>>> this message is that the local firewall is configured improperly. >>>> Sep 24 11:20:47 corosync [TOTEM ] The consensus timeout expired. >>>> >>>> Those last 3 messages just repeat over and over, the cluster never >>>> recovers, and the resources never move. "crm_mon" reports that the >>>> resources are still running on the dead node, and shows no indication >>>> that >>>> anything has gone wrong. >>>> >>>> Does anyone know what the issue could be? My expectation was that the >>>> remaining node would become the sole member of the cluster, take over >>>> the >>>> resources, and everything would keep running. >>>> >>>> For reference, my corosync.conf file is below: >>>> >>>> compatibility: whitetank >>>> >>>> totem { >>>> version: 2 >>>> secauth: off >>>> interface { >>>> member { >>>> memberaddr: 192.168.25.201 >>>> } >>>> member { >>>> memberaddr: 192.168.25.202 >>>> } >>>> ringnumber: 0 >>>> bindnetaddr: 192.168.25.0 >>>> mcastport: 5405 >>>> } >>>> transport: udpu >>>> } >>>> >>>> logging { >>>> fileline: off >>>> to_logfile: yes >>>> to_syslog: yes >>>> debug: on >>>> logfile: /var/log/cluster/corosync.log >>>> timestamp: on >>>> logger_subsys { >>>> subsys: AMF >>>> debug: on >>>> } >>>> } >>>> >>>> >>>> Thanks! >>>> Dave >>>> >>>> -- >>>> Dave Parker >>>> Systems Administrator >>>> Utica College >>>> Integrated Information Technology Services >>>> (315) 792-3229 >>>> Registered Linux User #408177 >>>> >>>> >>> >>> >>> >> _______________________________________________ >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> >> Project Home: http://www.clusterlabs.org >> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > > > -- > Dave Parker > Systems Administrator > Utica College > Integrated Information Technology Services > (315) 792-3229 > Registered Linux User #408177 > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org