Referring to the king of drbd... 
Lars, question for you inline.

On 11 Jun 2014, at 11:14 pm, Robert Dahlem <robert.dah...@gmx.net> wrote:

> Hi Andrew,
> 
> On 02.06.2014 02:57, Andrew Beekhof wrote:
> 
>>> This seems to be some kind of a race condition: I added
>>>     sleep 3
>>> to a central point in /usr/lib/ocf/resource.d/linbit/drbd.
>> 
>> Define central?
> 
> =======================================================================
> $ diff -u drbd.orig drbd
> --- drbd.orig    2014-06-11 14:02:57.000000000 +0200
> +++ drbd 2014-06-10 16:37:59.000000000 +0200
> @@ -1047,6 +1047,11 @@
> # Everything except usage and meta-data must pass the validate test
> drbd_validate_all || exit
> 
> +if $USE_DEBUG_LOG ; then
> +       echo OCF_ACTION=$__OCF_ACTION `date` >&9
> +       sleep 3
> +fi
> +
> case $__OCF_ACTION in
> start)
>        drbd_start
> =======================================================================
> 
>>> 1.) Note the parallel "start" at 15:46:53. This could very well end up
>>> in a race condition without "sleep 3".
>>> 
>>> 2.) Why is pacemaker doing "stop/start" at all on korfwf02?
>> 
>> This seems to correspond to 
>> 
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Move    
>> stonith-korfwf02       (Started korfwm01 -> korfwf01)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Move    
>> ALL-ffm        (Started korfwf02 -> korfwf01)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Demote  
>> DRBD-ffm:0     (Master -> Slave korfwf02)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Restart 
>> DRBD-ffm:0     (Slave korfwf02)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Start   
>> DRBD-ffm:1     (korfwf01)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Promote 
>> DRBD-ffm:1     (Stopped -> Master korfwf01)
>> May 23 13:29:31 korfwm01 pengine[5140]:   notice: process_pe_message: 
>> Calculated Transition 843: /var/lib/pacemaker/pengine/pe-input-728.bz2
>> 
>> from your original tarball.
>> 
>> In that case, the cause is:
>> 
>>      <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" 
>> first="ALL-ffm" then="ms-DRBD-ffm"/>
>> 
>> Which requires that ms-DRBD-ffm be completely stopped if ALL-ffm is stopped 
>> (which it is because its being moved to 01).
>> Perhaps you meant this?
>> 
>>      <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" 
>> first="ALL-ffm" then="ms-DRBD-ffm" then-action="promote"/>
> 
> I tried that. It triggered another race condition.
> 
> =======================================================================
> primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \
> op start interval=0 timeout=240 \
> op promote interval=0 timeout=90 \
> op demote interval=0 timeout=90 \
> op notify interval=0 timeout=90 \
> op stop interval=0 timeout=100 \
> op monitor role=Slave timeout=20 interval=20 \
> op monitor role=Master timeout=20 interval=10
> ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \
> clone-max=2 clone-node-max=1 notify=true
> colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \
> ms-DRBD-ffm:Master ALL-ffm
> order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote
> location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01
> location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02
> =======================================================================
> 
> # crm node standby korfwf01 ; sleep 10
> # crm node online korfwf01 ; sleep 10
> # crm resource move ALL-ffm korfwf01 ; sleep 10
> # crm node standby korfwf01 ; sleep 10
> # crm node online korfwf01 ; sleep 10
> *bang* split-brain.
> 
> This is because with the last command "online korfwf01" pacemaker starts
> and the immediately promotes ms-DRBD-ffm without giving any time for
> drbd to sync with the peer.

Have you seen anything like this before?
I don't know we have any capacity to delay the promotion in the PE... 
perhaps the agent needs to delay setting a master score if its out of date?
or maybe loop in the promote action and set a really long timeout

> Look at this log excerpt:
> 
> 14:16:16 korfwf01 drbd ffm: Starting worker thread (from drbdsetup [30742])
> 14:16:16 korfwf01 block drbd7: disk( Diskless -> Attaching )
> 14:16:16 korfwf01 block drbd7: disk( Attaching -> UpToDate )
> 14:16:16 korfwf01 drbd ffm: conn( StandAlone -> Unconnected )
> 14:16:16 korfwf01 drbd ffm: conn( Unconnected -> WFConnection )
> 14:16:16 korfwf01 block drbd7: role( Secondary -> Primary )
> 14:16:16 korfwf01 drbd ffm: conn( WFConnection -> WFReportParams )
> 14:16:17 korfwf01 notify-split-brain.sh[30933]: invoked for ffm/0 (drbd7)
> 
> After "start" korfwf01 progresses until WFConnection, it does not know
> anything about the state of korfwf02 yet. Then comes "promote", korfwf01
> changes to Primary. Only after that both nodes connect and korfwf01
> learns that korfwf02 has been Primary in the meantime -> split brain.
> 
> This does not happen in the first "standby/online/move" cycle because of
> "sleep 10" between "online" and "move", thus allowing for some time
> between "start" and "promote" and for re-connection between both nodes.
> 
> If have attached the crm_report to
>       http://bugs.clusterlabs.org/show_bug.cgi?id=5217
> 
> Kind regards,
> Robert
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to