Re: [Pacemaker] Recovery after lost quorum

Andrew Beekhof Tue, 04 Jun 2013 17:21:59 -0700

On 05/06/2013, at 9:22 AM, Denis Witt <denis.w...@concepts-and-training.de> 
wrote:


> 
> Am 05.06.2013 um 00:52 schrieb Andrew Beekhof <and...@beekhof.net>:
> 
>>> been restored the resources aren't restarted. Running crm_resource -P
>>> brings anything up, but of course it would be nice if this happens
>>> automatically. Is there any way to archive this?
>> 
>> It should happen automatically.
>> Logs?
> 
> Hi Andrew,
> 
> thanks for your reply.
> 
> Here are the logs:
> 

[snip]

> Jun  5 01:11:06 test4 pengine: [18625]: WARN: cluster_status: We do not have 
> quorum - fencing and resource management disabled
> Jun  5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start   
> pingtest:0#011(test4 - blocked)
> Jun  5 01:11:06 test4 pengine: [18625]: notice: LogActions: Start   
> drbd:0#011(test4 - blocked)

Here's your reason.  We didn't get quorum until:

> Jun  5 01:11:11 test4 crmd: [18626]: notice: ais_dispatch_message: Membership 
> 128: quorum acquired

[snipp]

> 
> Please notice that at the moment there are only two of the three nodes 
> online, but quorum is established,

Actually not.


> as expected. Both nodes are running corosync and pacemaker, but the second 
> node didn't have any of the configured resources (so i got "not installed" 
> errors there, usually pacemaker is disabled on this node). The resources 
> aren't started as well if pacemaker is disabled on this node (only corosync).
> 
> analysis.txt from hb_report states:
> 
> Log patterns:
> Jun  5 01:14:11 test4 crmd: [18626]: ERROR: crm_timer_popped: Integration 
> Timer (I_INTEGRATED) just popped in state S_INTEGRATION! (180000ms)
> 
> My config:
> 
> node backup3 \
>       attributes standby="off"
> node test3
> node test4
> primitive apache lsb:apache2 \
>       op monitor interval="10" timeout="20" \
>       meta target-role="Started"
> primitive drbd ocf:linbit:drbd \
>       params drbd_resource="www_r0" \
>       op monitor interval="10"
> primitive fs_drbd ocf:heartbeat:Filesystem \
>       params device="/dev/drbd0" directory="/var/www" fstype="ext4" \
>       op monitor interval="5" \
>       meta target-role="Started"
> primitive pingtest ocf:pacemaker:ping \
>       params multiplier="1000" host_list="192.168.100.19" \
>       op monitor interval="5"
> primitive sip ocf:heartbeat:IPaddr2 \
>       params ip="192.168.100.30" nic="eth0" \
>       op monitor interval="10" timeout="20" \
>       meta target-role="Started"
> group grp_all sip fs_drbd apache
> ms ms_drbd drbd \
>       meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> clone clone_pingtest pingtest
> location loc_all_on_best_ping grp_all \
>       rule $id="loc_all_on_best_ping-rule" -inf: not_defined pingd or pingd 
> lt 1000
> colocation coloc_all_on_drbd inf: grp_all ms_drbd:Master
> order order_all_after_drbd inf: ms_drbd:promote grp_all:start
> property $id="cib-bootstrap-options" \
>       dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>       cluster-infrastructure="openais" \
>       expected-quorum-votes="3" \
>       no-quorum-policy="stop" \
>       stonith-enabled="false" \
>       last-lrm-refresh="1370360692" \
>       default-resource-stickiness="100" \
>       maintenance-mode="false"
> 
> Best regards,
> Denis Witt
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Recovery after lost quorum

Reply via email to