On 13 Jun 2014, at 9:21 pm, Jason Hendry <jhen...@mintel.com> wrote:

> 
> Hi Everyone,
> 
> This is my first post, please let me know if I am missing any 
> standard/essential information to help with debugging...
> 
> I have a 2-node cluster with node-level fencing.  The cluster appears to be 
> configured with "Blind Faith" but my nodes are still killing each other if 
> the host is up but the cluster is not running on it, to produce this I:
> 
> Power-on both nodes
> Stop the cluster on both node [pcs cluster stop]
> Start the cluster on a single node  [pcs cluster start]
> 
> After starting the cluster I get this message the cluster logs:
> 
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: 
> unpack_nodes: Blind faith: not fencing unseen nodes
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:     info: 
> determine_online_status_fencing: Node ha-nfs1 is active
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:     info: 
> determine_online_status: Node ha-nfs1 is online
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: 
> pe_fence_node: Node ha-nfs2 will be fenced because the peer has not been seen 
> by the cluster
> 
> Am I miss-understanding the meaning of "Blind faith" or is something 
> mis-configured?

Looks like you might have found a bug.
"Blind faith" is a particularly dangerous option to turn on, so it doesn't get 
tested very often.

A few lines further down in your logs should be a message from pengine that 
looks something like:

Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad    pengine:  warning: 
process_pe_message: Calculated Transition ${X}: 
/var/lib/pacemaker/pengine/pe-warn-${Y}.bz2

If you can send us that file I'll make sure it gets fixed. 


>  Both my nodes are:
> 
> Centos 6.5 (Final)  (uname -a:  Linux dev-drbd01.london.mintel.ad 
> 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 
> x86_64 GNU/Linu
> pacemakerd --version  (  Pacemaker 1.1.10-14.el6_5.3  )
> 
> Here is my cluster configuration:
> 
> 
> pcs resource create nfsDRBD       ocf:linbit:drbd           drbd_resource=nfs 
> op monitor interval=8s meta migration-thresholds=0
> pcs resource create nfsLVM        ocf:heartbeat:LVM         
> volgrpname="vg_drbd" op monitor interval=7s meta migration-thresholds=0
> pcs resource create nfsDir        ocf:heartbeat:Filesystem  
> device=/dev/vg_drbd/lv_nfs_home directory=/data/nfs/home fstype=ext4 
> run_fsck=force op monitor interval=6s meta migration-thresholds=0
> pcs resource create nfsService    lsb:nfs op monitor        interval=5s meta 
> migration-thresholds=0
> pcs resource create nfsIP         ocf:heartbeat:IPaddr2     ip=a.b.c.d 
> cidr_netmask=32 op monitor interval=9s meta migration-thresholds=0
> pcs resource create network_ping  ocf:pacemaker:ping        name=network_ping 
> multiplier=5 host_list="a.b.c.d w.x.y.z" attempts=3 timeout=1 
> failure_score=10 op monitor interval=4s
> pcs resource clone  network_ping                            op meta 
> interleave=true
> 
> pcs resource master nfsDRBD_ms nfsDRBD master-max=1 master-node-max=1 
> clone-max=2 clone-node-max=1 notify=true target-role=Started is-managed=true
> pcs resource group add nfsGroup nfsLVM nfsDir nfsService nfsIP
> 
> pcs constraint order promote nfsDRBD_ms then start nfsGroup kind=Mandatory 
> symmetrical=false
> pcs constraint order stop nfsGroup then demote nfsDRBD_ms kind=Optional 
> symmetrical=false
> pcs constraint colocation add nfsGroup with master nfsDRBD_ms INFINITY
> 
> pcs property set no-quorum-policy=ignore
> pcs property set expected-quorum-votes=1
> pcs property set stonith-enabled=true
> pcs property set default-resource-stickiness=200
> pcs property set batch-limit=1
> pcs property set startup-fencing=false
> 
> pcs stonith create ha-nfs1_poweroff fence_virsh action=off ipaddr=a.b.c.d 
> login=stonith secure=yes identity_file=/data/stonith_id_rsa 
> port=dev-drbd01.london pcmk_host_map="ha-nfs1:dev-drbd01.london" op meta 
> priority=200
> pcs stonith create ha-nfs2_poweroff fence_virsh action=off ipaddr=w.x.y.z 
> login=stonith secure=yes identity_file=/data/stonith_id_rsa 
> port=dev-drbd02.london pcmk_host_map="ha-nfs2:dev-drbd02.london" op meta 
> priority=200
> 
> pcs stonith level add 1 ha-nfs1 ha-nfs1_poweroff
> pcs stonith level add 1 ha-nfs2 ha-nfs2_poweroff
> 
> pcs constraint location ha-nfs1_poweroff prefers ha-nfs1=-INFINITY
> pcs constraint location ha-nfs2_poweroff prefers ha-nfs2=-INFINITY
> pcs constraint location nfsDRBD rule role=Master defined network_ping
> 
> Jason H
> 
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> 
> Contact details for our other offices can be found at 
> http://www.mintel.com/office-locations
> .
> 
> This email and any attachments may include content that is confidential, 
> privileged 
> 
>  or otherwise protected under applicable law. Unauthorised disclosure, 
> copying, distribution 
>  or use of the contents is prohibited and may be unlawful. If you have 
> received this email in error,
>  including without appropriate authorisation, then please reply to the sender 
> about the error 
>  and delete this email and any attachments.
> 
> 
> _______________________________________________
> 
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to