On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote: > > Hi Andrew (and others). > > For a certain use case (yes, I'm talking about DRBD "peer-fencing" on > loss of replication link), it would be nice to be able to say: > > update some_attribute=some_attribute+1 where some_attribute >= 0 > > delete some_attribute where some_attribute=0 > > Ok, that's not the classic cmpxchg(), more of an atomic_add(); > or similar enough. With hopefully just a single cib roundrip. > > > Let me rephrase: > Update attribute "this_is_pink" (for node-X with ID attr-ID): > > fail if said attr-ID exists elsewhere (not as the intended attribute > at the intended place in the xml tree) > (this comes for free already, I think) > > if it does not exist at all, assume it was present with current value 0 > > if the current (or assumed current) value is >= 0, add 1 > > if the current value is < 0, fail > > (optionally: return new value? old value?)
Did anyone read this? > My intended use case scenario is this: > > Two DRBD nodes, several DRBD resources, > at least a few of them in "dual-primary". > > Replication link breaks. > > Fence-peer handlers are triggered individually for each resource on > both nodes, and try to concurrently modify the cib (place fencing > constraints). > > With the current implementation of crm-fence-peer.sh, it is likely that > some DRBD resources "win" on one node, some "win" on the other node. > The respective losers will have their IO blocked. > > Which means that most likely on both nodes some DRBD will stay blocked, > some monitor operation will soon fail, some stop operation (to recover > from the monitor fail) will soon fail, and the recovery of that will be > node-level fencing of the affected node. > > In short: both nodes will be hard-reset > because of a replication link failure. > > > > If I would instead use a single attribute (with a pre-determined ID) for all > instances of the fence-peer handler, the first to come would "chose" the > victim node, all others would just add their count. > There will be only one loser, and more importantly: one survivor. > > Once the replication link is re-established, > DRBD resynchronization will bring the former loser up-to-date, > and the respective after-resync handlers will decrease that "breakage > count". Once the breakage count hits zero, it can and should be deleted. > > Presence of the "breakage count" attribute with value > 0 would mean > "this node must not be promoted", which would be a static constraint > to be added to all DRBD resources. > > Does that make sense? > > (I have more insane proposals, in case we have multiple (more than 2) > Primaries during normal operation, but I'm not yet able to write them > down without being seriously confused by myself...) > > > I could open-code it with shell and cibadmin, btw. > I did a proof-of-concept once that does > a. cibadmin -Q > b. some calculations, > then prepares the update statement xml based on cib content seen, > *including* the cib generation counters > c. cibadmin -R (or -C, -M, -D, as appropriate) > this will fail if the cib was modified in a relevant way since a, > because of the included generation counters > d. repeat as necessary > > > But that is beyond ugly. > And probably fragile. > And would often fail for all the wrong reasons, just because some status > code has changed and bumped the cib generation counters. > > What would be needed to add such functionality? > Where would it go? > cibadmin? cib? crm_attribute? possibly also attrd? > > Thanks, > Lars > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org