On 16 Jan 2014, at 6:53 am, Brian J. Murrell (brian) <br...@interlinx.bc.ca> wrote:
> On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: >> >> Consider any long running action, such as starting a database. >> We do not update the CIB until after actions have completed, so there can >> and will be times when the status section is out of date to one degree or >> another. > > But that is the opposite of what I am reporting I know, I was giving you another example of when the cib is not completely up-to-date with reality. > and is acceptable. It's > acceptable for a resource that is in the process of starting being > reported as stopped, because it's not yet started. It may very well be partially started. Its almost certainly not stopped which is what is being reported. > > What I am seeing is resources being reported as stopped when they are in > fact started/running and have been for a long time. > >> At node startup is another point at which the status could potentially be >> behind. > > Right. Which is the case I am talking about. > >> It sounds to me like you're trying to second guess the cluster, which is a >> dangerous path. > > No, not trying to second guess at all. You're not using the output to decide whether to perform some logic? Because crm_mon is the more usual command to run right after startup (which would give you enough context to know things are still syncing). > I'm just trying to ask the > cluster what the state is and not getting the truth. I am willing to > believe whatever state the cluster says it's in as long as what I am > getting is the truth. > >> What if its the first node to start up? > > I'd think a timeout comes in to play here. > >> There'd be no fresh copy to arrive in that case. > > I can't say that I know how the CIB works internally/entirely, but I'd > imagine that when a cluster node starts up it tries to see if there is a > more fresh CIB out there in the cluster. Nope. > Maybe this is part of the > process of choosing/discovering a DC. DC election happens at the crmd. The cib is a dumb repository of name/value pairs. It doesn't even understand new vs. old - only different. > But ultimately if the node is the > first one up, it will eventually figure that out so that it can nominate > itself as the DC. Or it finds out that there is a DC already (and gets > a fresh CIB from it?). It's during that window that I propose that > crm_resource should not be asserting anything and should just admit that > it does not (yet) know. > >> If it had enough information to know it was out of date, it wouldn't be out >> of date. > > But surely it understands if it is in the process of joining a cluster > or not, and therefore does know enough to know that it doesn't know if > it's out of date or not. And if it has a newer config compared to the existing nodes? > But that it could be. > >> As above, there are situations when you'd never get an answer. > > I should have added to my proposal "or has determined that there is > nothing to refresh it's CIB from" and that it's local copy is > authoritative for the whole cluster. > > b. > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org