Referring to the king of drbd... Lars, question for you inline. On 11 Jun 2014, at 11:14 pm, Robert Dahlem <robert.dah...@gmx.net> wrote:
> Hi Andrew, > > On 02.06.2014 02:57, Andrew Beekhof wrote: > >>> This seems to be some kind of a race condition: I added >>> sleep 3 >>> to a central point in /usr/lib/ocf/resource.d/linbit/drbd. >> >> Define central? > > ======================================================================= > $ diff -u drbd.orig drbd > --- drbd.orig 2014-06-11 14:02:57.000000000 +0200 > +++ drbd 2014-06-10 16:37:59.000000000 +0200 > @@ -1047,6 +1047,11 @@ > # Everything except usage and meta-data must pass the validate test > drbd_validate_all || exit > > +if $USE_DEBUG_LOG ; then > + echo OCF_ACTION=$__OCF_ACTION `date` >&9 > + sleep 3 > +fi > + > case $__OCF_ACTION in > start) > drbd_start > ======================================================================= > >>> 1.) Note the parallel "start" at 15:46:53. This could very well end up >>> in a race condition without "sleep 3". >>> >>> 2.) Why is pacemaker doing "stop/start" at all on korfwf02? >> >> This seems to correspond to >> >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move >> stonith-korfwf02 (Started korfwm01 -> korfwf01) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move >> ALL-ffm (Started korfwf02 -> korfwf01) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Demote >> DRBD-ffm:0 (Master -> Slave korfwf02) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Restart >> DRBD-ffm:0 (Slave korfwf02) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Start >> DRBD-ffm:1 (korfwf01) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Promote >> DRBD-ffm:1 (Stopped -> Master korfwf01) >> May 23 13:29:31 korfwm01 pengine[5140]: notice: process_pe_message: >> Calculated Transition 843: /var/lib/pacemaker/pengine/pe-input-728.bz2 >> >> from your original tarball. >> >> In that case, the cause is: >> >> <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" >> first="ALL-ffm" then="ms-DRBD-ffm"/> >> >> Which requires that ms-DRBD-ffm be completely stopped if ALL-ffm is stopped >> (which it is because its being moved to 01). >> Perhaps you meant this? >> >> <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" >> first="ALL-ffm" then="ms-DRBD-ffm" then-action="promote"/> > > I tried that. It triggered another race condition. > > ======================================================================= > primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \ > op start interval=0 timeout=240 \ > op promote interval=0 timeout=90 \ > op demote interval=0 timeout=90 \ > op notify interval=0 timeout=90 \ > op stop interval=0 timeout=100 \ > op monitor role=Slave timeout=20 interval=20 \ > op monitor role=Master timeout=20 interval=10 > ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \ > clone-max=2 clone-node-max=1 notify=true > colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \ > ms-DRBD-ffm:Master ALL-ffm > order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote > location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01 > location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02 > ======================================================================= > > # crm node standby korfwf01 ; sleep 10 > # crm node online korfwf01 ; sleep 10 > # crm resource move ALL-ffm korfwf01 ; sleep 10 > # crm node standby korfwf01 ; sleep 10 > # crm node online korfwf01 ; sleep 10 > *bang* split-brain. > > This is because with the last command "online korfwf01" pacemaker starts > and the immediately promotes ms-DRBD-ffm without giving any time for > drbd to sync with the peer. Have you seen anything like this before? I don't know we have any capacity to delay the promotion in the PE... perhaps the agent needs to delay setting a master score if its out of date? or maybe loop in the promote action and set a really long timeout > Look at this log excerpt: > > 14:16:16 korfwf01 drbd ffm: Starting worker thread (from drbdsetup [30742]) > 14:16:16 korfwf01 block drbd7: disk( Diskless -> Attaching ) > 14:16:16 korfwf01 block drbd7: disk( Attaching -> UpToDate ) > 14:16:16 korfwf01 drbd ffm: conn( StandAlone -> Unconnected ) > 14:16:16 korfwf01 drbd ffm: conn( Unconnected -> WFConnection ) > 14:16:16 korfwf01 block drbd7: role( Secondary -> Primary ) > 14:16:16 korfwf01 drbd ffm: conn( WFConnection -> WFReportParams ) > 14:16:17 korfwf01 notify-split-brain.sh[30933]: invoked for ffm/0 (drbd7) > > After "start" korfwf01 progresses until WFConnection, it does not know > anything about the state of korfwf02 yet. Then comes "promote", korfwf01 > changes to Primary. Only after that both nodes connect and korfwf01 > learns that korfwf02 has been Primary in the meantime -> split brain. > > This does not happen in the first "standby/online/move" cycle because of > "sleep 10" between "online" and "move", thus allowing for some time > between "start" and "promote" and for re-connection between both nodes. > > If have attached the crm_report to > http://bugs.clusterlabs.org/show_bug.cgi?id=5217 > > Kind regards, > Robert > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org