Hi Andrew, On 02.06.2014 02:57, Andrew Beekhof wrote:
>> This seems to be some kind of a race condition: I added >> sleep 3 >> to a central point in /usr/lib/ocf/resource.d/linbit/drbd. > > Define central? ======================================================================= $ diff -u drbd.orig drbd --- drbd.orig 2014-06-11 14:02:57.000000000 +0200 +++ drbd 2014-06-10 16:37:59.000000000 +0200 @@ -1047,6 +1047,11 @@ # Everything except usage and meta-data must pass the validate test drbd_validate_all || exit +if $USE_DEBUG_LOG ; then + echo OCF_ACTION=$__OCF_ACTION `date` >&9 + sleep 3 +fi + case $__OCF_ACTION in start) drbd_start ======================================================================= >> 1.) Note the parallel "start" at 15:46:53. This could very well end up >> in a race condition without "sleep 3". >> >> 2.) Why is pacemaker doing "stop/start" at all on korfwf02? > > This seems to correspond to > > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move > stonith-korfwf02 (Started korfwm01 -> korfwf01) > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move ALL-ffm > (Started korfwf02 -> korfwf01) > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Demote > DRBD-ffm:0 (Master -> Slave korfwf02) > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Restart > DRBD-ffm:0 (Slave korfwf02) > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Start > DRBD-ffm:1 (korfwf01) > May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Promote > DRBD-ffm:1 (Stopped -> Master korfwf01) > May 23 13:29:31 korfwm01 pengine[5140]: notice: process_pe_message: > Calculated Transition 843: /var/lib/pacemaker/pengine/pe-input-728.bz2 > > from your original tarball. > > In that case, the cause is: > > <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" > first="ALL-ffm" then="ms-DRBD-ffm"/> > > Which requires that ms-DRBD-ffm be completely stopped if ALL-ffm is stopped > (which it is because its being moved to 01). > Perhaps you meant this? > > <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" > first="ALL-ffm" then="ms-DRBD-ffm" then-action="promote"/> I tried that. It triggered another race condition. ======================================================================= primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \ op start interval=0 timeout=240 \ op promote interval=0 timeout=90 \ op demote interval=0 timeout=90 \ op notify interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor role=Slave timeout=20 interval=20 \ op monitor role=Master timeout=20 interval=10 ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 notify=true colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \ ms-DRBD-ffm:Master ALL-ffm order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01 location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02 ======================================================================= # crm node standby korfwf01 ; sleep 10 # crm node online korfwf01 ; sleep 10 # crm resource move ALL-ffm korfwf01 ; sleep 10 # crm node standby korfwf01 ; sleep 10 # crm node online korfwf01 ; sleep 10 *bang* split-brain. This is because with the last command "online korfwf01" pacemaker starts and the immediately promotes ms-DRBD-ffm without giving any time for drbd to sync with the peer. Look at this log excerpt: 14:16:16 korfwf01 drbd ffm: Starting worker thread (from drbdsetup [30742]) 14:16:16 korfwf01 block drbd7: disk( Diskless -> Attaching ) 14:16:16 korfwf01 block drbd7: disk( Attaching -> UpToDate ) 14:16:16 korfwf01 drbd ffm: conn( StandAlone -> Unconnected ) 14:16:16 korfwf01 drbd ffm: conn( Unconnected -> WFConnection ) 14:16:16 korfwf01 block drbd7: role( Secondary -> Primary ) 14:16:16 korfwf01 drbd ffm: conn( WFConnection -> WFReportParams ) 14:16:17 korfwf01 notify-split-brain.sh[30933]: invoked for ffm/0 (drbd7) After "start" korfwf01 progresses until WFConnection, it does not know anything about the state of korfwf02 yet. Then comes "promote", korfwf01 changes to Primary. Only after that both nodes connect and korfwf01 learns that korfwf02 has been Primary in the meantime -> split brain. This does not happen in the first "standby/online/move" cycle because of "sleep 10" between "online" and "move", thus allowing for some time between "start" and "promote" and for re-connection between both nodes. If have attached the crm_report to http://bugs.clusterlabs.org/show_bug.cgi?id=5217 Kind regards, Robert _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org