You don't have real fencing configured, by the looks of it. Without
real, working fencing, recovery can be unpredictable. Can you set that
up and see if the problem goes away?
digimer
On 23/09/14 09:59 AM, Carsten Otto wrote:
On Tue, Sep 23, 2014 at 09:50:12AM -0400, Digimer wrote:
Can you share your pacemaker and drbd configurations please?
drbd.d/global_comman.conf:
global {
usage-count no;
}
common {
protocol C;
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
}
}
drbd.d/disk0.res:
resource disk0 {
syncer {
rate 10M;
csums-alg sha1;
}
disk {
on-io-error detach;
fencing resource-only;
}
handlers {
before-resync-target
"/usr/lib/drbd/snapshot-resync-target-lvm.sh";
after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
/usr/lib/drbd/crm-unfence-peer.sh";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
}
device /dev/drbd0;
disk /dev/centos/drbd-lv;
meta-disk internal;
on node_a {
address 192.168.69.89:7789;
}
on node_b {
address 192.168.69.90:7789;
}
}
pcs resource --full:
Master: DRBD_MASTER
Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true failure-timeout=60sec
Resource: DRBD (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=disk0
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=240 (DRBD-start-timeout-240)
promote interval=0s timeout=90 (DRBD-promote-timeout-90)
demote interval=0s timeout=90 (DRBD-demote-timeout-90)
stop interval=0s timeout=100 (DRBD-stop-timeout-100)
monitor interval=9 role=Master (DRBD-monitor-interval-9)
monitor interval=11 role=Slave (DRBD-monitor-interval-11)
Group: GROUP
Resource: VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.69.48 cidr_netmask=32
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=20s (VIP-start-timeout-20s)
stop interval=0s timeout=5s (VIP-stop-timeout-5s)
monitor interval=10sec (VIP-monitor-interval-10sec)
Resource: FS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/mnt/drbd
options=noatime,nodiratime fstype=ext4
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=60 (FS-start-timeout-60)
stop interval=0s timeout=10s (FS-stop-timeout-10s)
monitor interval=5sec (FS-monitor-interval-5sec)
Resource: PGSQL (class=ocf provider=heartbeat type=pgsql)
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=120 (PGSQL-start-timeout-120)
stop interval=0s timeout=120 (PGSQL-stop-timeout-120)
promote interval=0s timeout=120 (PGSQL-promote-timeout-120)
demote interval=0s timeout=120 (PGSQL-demote-timeout-120)
monitor interval=10sec (PGSQL-monitor-interval-10sec)
Resource: ASTERISK (class=ocf provider=heartbeat type=asterisk)
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=20 (ASTERISK-start-timeout-20)
monitor interval=10sec (ASTERISK-monitor-interval-10sec)
stop interval=0s timeout=1 (ASTERISK-stop-timeout-1)
Resource: TOMCAT (class=ocf provider=heartbeat type=tomcat)
Attributes: java_home=/usr/java/latest/ catalina_home=/usr/share/tomcat
statusurl=http://localhost:8080/xxx/
Meta Attrs: failure-timeout=60sec
Operations: start interval=0s timeout=60s (TOMCAT-start-timeout-60s)
stop interval=0s timeout=20s (TOMCAT-stop-timeout-20s)
monitor interval=10sec (TOMCAT-monitor-interval-10sec)
pcs constraint --full:
Location Constraints:
Resource: DRBD_MASTER
Constraint: drbd-fence-by-handler-disk0-DRBD_MASTER
Rule: score=-INFINITY role=Master
(id:drbd-fence-by-handler-disk0-rule-DRBD_MASTER)
Expression: #uname ne node_a
(id:drbd-fence-by-handler-disk0-expr-DRBD_MASTER)
Resource: STONITH_A
Disabled on: node_b (score:-INFINITY)
(id:location-STONITH_A-node_b--INFINITY)
Ordering Constraints:
promote DRBD_MASTER then start GROUP (Mandatory)
(id:order-DRBD_MASTER-GROUP-mandatory)
Colocation Constraints:
GROUP with DRBD_MASTER (INFINITY) (rsc-role:Started) (with-rsc-role:Master)
(id:colocation-GROUP-DRBD_MASTER-INFINITY)
pcs stonith --full:
STONITH_A (stonith:fence_dummy): Started
Resource: STONITH_A (class=stonith type=fence_dummy)
Attributes: passwd=x pcmk_host_list=node_b
Operations: monitor interval=60s (STONITH_A-monitor-interval-60s)
[Note: The problem also happens without stonith and with a proper stonith
configuration on both nodes!]
pcs property:
Cluster Properties:
cluster-infrastructure: corosync
cluster-recheck-interval: 5min
dc-version: 1.1.10-32.el7_0-368c726
last-lrm-refresh: 1411475550
no-quorum-policy: ignore
stonith-enabled: true
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org