On 13 Jun 2014, at 9:21 pm, Jason Hendry <jhen...@mintel.com> wrote:
> > Hi Everyone, > > This is my first post, please let me know if I am missing any > standard/essential information to help with debugging... > > I have a 2-node cluster with node-level fencing. The cluster appears to be > configured with "Blind Faith" but my nodes are still killing each other if > the host is up but the cluster is not running on it, to produce this I: > > Power-on both nodes > Stop the cluster on both node [pcs cluster stop] > Start the cluster on a single node [pcs cluster start] > > After starting the cluster I get this message the cluster logs: > > Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning: > unpack_nodes: Blind faith: not fencing unseen nodes > Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: info: > determine_online_status_fencing: Node ha-nfs1 is active > Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: info: > determine_online_status: Node ha-nfs1 is online > Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning: > pe_fence_node: Node ha-nfs2 will be fenced because the peer has not been seen > by the cluster > > Am I miss-understanding the meaning of "Blind faith" or is something > mis-configured? Looks like you might have found a bug. "Blind faith" is a particularly dangerous option to turn on, so it doesn't get tested very often. A few lines further down in your logs should be a message from pengine that looks something like: Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning: process_pe_message: Calculated Transition ${X}: /var/lib/pacemaker/pengine/pe-warn-${Y}.bz2 If you can send us that file I'll make sure it gets fixed. > Both my nodes are: > > Centos 6.5 (Final) (uname -a: Linux dev-drbd01.london.mintel.ad > 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 > x86_64 GNU/Linu > pacemakerd --version ( Pacemaker 1.1.10-14.el6_5.3 ) > > Here is my cluster configuration: > > > pcs resource create nfsDRBD ocf:linbit:drbd drbd_resource=nfs > op monitor interval=8s meta migration-thresholds=0 > pcs resource create nfsLVM ocf:heartbeat:LVM > volgrpname="vg_drbd" op monitor interval=7s meta migration-thresholds=0 > pcs resource create nfsDir ocf:heartbeat:Filesystem > device=/dev/vg_drbd/lv_nfs_home directory=/data/nfs/home fstype=ext4 > run_fsck=force op monitor interval=6s meta migration-thresholds=0 > pcs resource create nfsService lsb:nfs op monitor interval=5s meta > migration-thresholds=0 > pcs resource create nfsIP ocf:heartbeat:IPaddr2 ip=a.b.c.d > cidr_netmask=32 op monitor interval=9s meta migration-thresholds=0 > pcs resource create network_ping ocf:pacemaker:ping name=network_ping > multiplier=5 host_list="a.b.c.d w.x.y.z" attempts=3 timeout=1 > failure_score=10 op monitor interval=4s > pcs resource clone network_ping op meta > interleave=true > > pcs resource master nfsDRBD_ms nfsDRBD master-max=1 master-node-max=1 > clone-max=2 clone-node-max=1 notify=true target-role=Started is-managed=true > pcs resource group add nfsGroup nfsLVM nfsDir nfsService nfsIP > > pcs constraint order promote nfsDRBD_ms then start nfsGroup kind=Mandatory > symmetrical=false > pcs constraint order stop nfsGroup then demote nfsDRBD_ms kind=Optional > symmetrical=false > pcs constraint colocation add nfsGroup with master nfsDRBD_ms INFINITY > > pcs property set no-quorum-policy=ignore > pcs property set expected-quorum-votes=1 > pcs property set stonith-enabled=true > pcs property set default-resource-stickiness=200 > pcs property set batch-limit=1 > pcs property set startup-fencing=false > > pcs stonith create ha-nfs1_poweroff fence_virsh action=off ipaddr=a.b.c.d > login=stonith secure=yes identity_file=/data/stonith_id_rsa > port=dev-drbd01.london pcmk_host_map="ha-nfs1:dev-drbd01.london" op meta > priority=200 > pcs stonith create ha-nfs2_poweroff fence_virsh action=off ipaddr=w.x.y.z > login=stonith secure=yes identity_file=/data/stonith_id_rsa > port=dev-drbd02.london pcmk_host_map="ha-nfs2:dev-drbd02.london" op meta > priority=200 > > pcs stonith level add 1 ha-nfs1 ha-nfs1_poweroff > pcs stonith level add 1 ha-nfs2 ha-nfs2_poweroff > > pcs constraint location ha-nfs1_poweroff prefers ha-nfs1=-INFINITY > pcs constraint location ha-nfs2_poweroff prefers ha-nfs2=-INFINITY > pcs constraint location nfsDRBD rule role=Master defined network_ping > > Jason H > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72 > > Contact details for our other offices can be found at > http://www.mintel.com/office-locations > . > > This email and any attachments may include content that is confidential, > privileged > > or otherwise protected under applicable law. Unauthorised disclosure, > copying, distribution > or use of the contents is prohibited and may be unlawful. If you have > received this email in error, > including without appropriate authorisation, then please reply to the sender > about the error > and delete this email and any attachments. > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org