On Mon, May 2, 2011 at 10:26 PM, Lars Marowsky-Bree <l...@novell.com> wrote: > On 2011-04-29T10:32:25, Andrew Beekhof <and...@beekhof.net> wrote: > >> With such a long email, assume agreement for anything I don't >> explicitly complain about :-) > > Sorry :-) I'm actually trying to write this up into a somewhat more > consistent document just now, which turns out to be surprisingly hard > ... Not that easily structured. I assume anything is better than nothing > though. > >> > It's an excellent question where the configuration of the Cluster Token >> > Registry would reside; I'd assume that there would be a >> > resource/primitive/clone (design not finished) that corresponds to the >> > daemon instance, >> A resource or another daemon like crmd/cib/etc? >> Could go either way I guess. > > Part of my goal is to have this as an add-on on top of Pacemaker. > Ideally, short of the few PE/CIB enhancements, I'd love it if Pacemaker > wouldn't even have to know about this. > > The tickets clearly can only be acquired if the rest of the cluster is > up already, so having this as a clone makes some sense, and provides > some monitoring of the service itself. (Similar to how ocfs2_controld is > managed.)
Yep, not disagreeing. > >> > (I think the word works - you can own a ticket, grant a ticket, cancel, >> > and revoke tickets ...) >> Maybe. >> I think token is still valid though. Not like only one project in the >> world uses heartbeats either. >> (Naming a project after a generic term is another matter). > > I have had multiple people confused at the "token" word in the CTR and > corosync contexts already. I just wanted to suggest to kill that as > early as possible if we can ;-) Shrug. I guess whoever writes it gets to choose :-) >> > Site-internal partitioning is handled at exactly that level; only the >> > winning/quorate partition will be running the CTR daemon and >> > re-establish communication with the other CTR instances. It will fence >> > the losers. >> >> Ah, so thats why you suggested it be a resource. > > Yes. > >> Question though... what about no-quorum-policy=ignore ? > > That was implicit somewhere later on, I think. The CTR must be able to > cope with multiple partitions of the same site, and would only grant the > T to one of them. But you'll still have a (longer) time when both partitions will think they own the token. Potentially long enough to begin starting resources. > >> > Probably it makes sense to add a layer of protection here to the CTR, >> > though - if several partitions from the same site connect (which could, >> > conceivably, happen), the CTRs will grant the ticket(s) only to the >> > partition with the highest node count (or, should these be equal, >> > lowest nodeid), >> How about longest uptime instead? Possibly too variable? > > That would work too, this was just to illustrate that there needs to be > a unique tie-breaker of last resort that is guaranteed to break said > tie. > >> >> Additionally, when a split-brain happens, how about the existing >> >> stonith mechanism. Should the partition without quorum be stonithed? >> > Yes, just as before. >> Wouldn't that depend on whether a deadman constraint existed for one >> of the lost tickets? > > Well, like I said: just as before. We don't have to STONITH anything if > we know that the nodes are clean. But, by the way, we still do, since we > don't trust nodes which failed. So unless we change the algorithm, the > partitions would get shot already, and nothing wrong with that ... Or > differently put: CTR doesn't require any change of behaviour here. I'm not arguing that, I'm just saying I don't think we need an additional construct. >> Isn't kind=deadman for ordering constraints redundant though? > > It's not required for this approach, as far as I can see, since this > only needs it for the T dependencies. I don't really care what else it > gets added to ;-) I do :-) > >> > Andrew, Yan - do you think we should allow _values_ for tickets, or >> > should they be strictly defined/undefined/set/unset? >> Unclear. It might be nice to store the expiration (and/or last grant) >> time in there for admin tools to do something with. >> But that could mean a lot of spurious CIB updates, so maybe its better >> to build that into the ticket daemon's api. > > I think sometime later in the discussion I actually made a case for > certain values. > >> > The ticket not being set/defined should be identical to the ticket being >> > set to "false/no", as far as I can see - in either case, the ticket is >> > not owned, so all resources associated with it _must_ be stopped, and >> > may not be started again. >> There is a startup issue though. >> You don't want to go fencing yourself before you can start the daemon >> and attempt to get the token. >> >> But the fencing logic would presumably only happen if you DONT have >> the ticket but DO have an affected resource active. > > Right. If you don't own anything that depends on the ticket that you > haven't got, nothing happens. > > So no start-up issue - unless someone has misconfigured ticket-protected > resources to be started outside the scope of Pacemaker, but that's > deserved then ;-) Yep. Just calling it out so that we have it written down somewhere. >> > Good question. This came up above already briefly ... >> > >> > I _think_ there should be a special value that a ticket can be set to >> > that doesn't fence, but stops everything cleanly. >> >> Again, wouldn't fencing only happen if a deadman dep made use of the ticket? > > Right, all of the above assumed that one actually had resources that > depend on the ticket active. Otherwise, one wouldn't know which nodes to > fence for this anyway. > >> Otherwise we probably want: >> <token id=... loss-policy=(fence|stop|freeze) granted=(true|false) /> >> >> with the daemon only updating the "granted" field. > > Yeah. What I wanted to hint at above though was an > owned-policy=(start|stop) to allow admins to cleanly stop the services > even while still owning the ticket - and still be able to recover from a > revocation properly (i.e., still fencing active resources). > >> > (Tangent - ownership appears to belong to the status section; the value >> > seems belongs to the cib->ticket section(?).) >> Plausible - since you'd not want nodes to come up and think they have >> tickets. >> That would also negate my concern about including the expiration time >> in the ticket. > > Right. One thing that ties into this here is the "how do tickets expire > if the CTR dies on us", since then noone is around to revoke it from the > CIB. > > I thought about handling this in the LRM, CIB, or PE (via the recheck > interval), but they all suck. The cleanest and most reliable way seems > to be to make death-of-ctr fatal for the nodes - just like > ocfs2_controld or sbd via the watchdog. agree > But storing the acquisition time in the CIB probably is quite useful for > the tools. I assume that typically we'll have <5 tickets around; an > additional time stamp won't hurt us. yep _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker