> > Specifically, I am not clear on whether 'convergence' means: > > (a) Heat continues to respect the dependency graph but does not stop > > after one traversal, instead repeatedly processing it until (and even > > after) the stack is complete; or > > (b) Heat ignores the dependency graph and just throws everything > > against the wall, repeating until it has all stuck. > > > > I think (c). We still have the graph driving "what to do next" so that > the things are more likely to stick. Also we don't want to do 10,000 > instance creations if the database they need isn't going to come > available. > > But we decouple "I need to do something" from "The user asked for > something" by allowing the convergence engine to act on notifications > from the observer engine. In addition to allowing more automated actions, > it should allow us to use finer grained locking because no individual > task will need to depend on the whole graph or stack. If an operator > comes along and changes templates or parameters, we can still complete > our outdated action. Eventually convergence will arrive at a state which > matches the desired stack.
There could be live/dead locks if the granularity becomes smaller. Need some ruling design to avoid it before we find it too difficult to debug. > > I also have doubts about the principle "Users should only need to > > intervene with a stack when there is no right action that Heat can take > > to deliver the current template+parameters". That sounds good in theory, > > but in practice it's very hard to know when there is a right action Heat > > can take and when there isn't. e.g. There are innumerable ways to create > > a template that can _never_ actually converge, and I don't believe > > there's a general way we can detect that, only the hard way: one error > > type at a time, for every single resource type. Offering users a way to > > control how and when that happens allows them to make the best decisions > > for their particular circumstances - and hopefully a future WFaaS like > > Mistral will make it easy to set up continuous monitoring for those who > > require it. (Not incidentally, it also gives cloud operators an > > opportunity to charge their users in proportion to their actual > > requirements.) > > > > There are some obvious times where there _is_ a clear automated answer > that does not require me to defer to a user's special workflow. 503 or > 429 (I know, not ratified yet) status codes mean I should retry after > maybe backing off a bit. If I get an ERROR state on a nova VM, I should > retry a few times before giving up. +1 on this. > The point isn't that we have all the answers, it is that there are > plenty of places where where we do have good answers that will serve > most users well. Right. I would expect all resources in Heat to be wrapped (encapsulated) very well that they know how to handle most events. Well, in some cases, additional hints are expected/needed from the events. If a resource doesn't know how to respond to an event, we provide a default (well-defined) propagation path for the message. Assuming this can be done, we only have to deal with some macro-level complexities where an external workflow is needed. > This obsoletes that. We don't need to keep track if we adopt a convergence > model. The template that the user has asked for, is the template we > converge on. The diff between that and reality dictates the changes we > need to make. Wherever we're at with the convergence step that was last > triggered can just be cancelled by the new one. Seems that we need a protocol for cancelling an operation then ... _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev