Le 10/01/2017 14:49, Sylvain Bauza a écrit : > Aloha folks, > > Recently, I was discussing with TripleO folks. Disclaimer, I don't think > it's only a TripleO related discussion but rather a larger one for all > our deployers. > > So, the question I was asked was about how to upgrade from Newton to > Ocata for the Placement API when the deployer is not using yet the > Placement API for Newton (because it was optional in Newton). > > The quick answer was to say "easy, just upgrade the service and run the > placement API *before* the scheduler upgrade". That's because we're > working on a change for the scheduler calling the Placement API instead > of getting all the compute nodes [1] > > That said, I thought about something else : wait, the Newton compute > nodes work with the Placement API, cool. Cool, but what if the Placement > API is optional in Newton ? Then, the Newton computes are stopping to > call the Placement API because of a nice decorator [2] (okay with me) > > Then, imagine the problem for the upgrade : given we don't have > deployers running the Placement API in Newton, they would need to > *first* deploy the (Newton or Ocata) Placement service, then SIGHUP all > the Newton compute nodes to have them reporting the resources (and > creating the inventories), then wait for some minutes that all the > inventories are reported, and then upgrade all the services (but the > compute nodes of course) to Ocata, including the scheduler service. > > The above looks a different upgrade policy, right? > - Either we say you need to run the Newton placement service *before* > upgrading - and in that case, the Placement service is not optional for > Newton, right? > - Or, we say you need to run the Ocata placement service and then > restart the compute nodes *before* upgrading the services - and that's a > very different situation than the current upgrade way. > > For example, I know it's not a Nova stuff, but most of our deployers > have what they say "controllers" vs. "compute" services, ie. all the > Nova services but computes running on a single (or more) machine(s). In > that case, the "controller" upgrade is monotonic and all the services > are upgraded and restarted at the same stage. If so, that looks > difficult for those deployers to just be asked to have a very different > procedure. > > Anyway, I think we need to carefully consider that, and probably find > some solutions. For example, we could imagine (disclaimer #2, that's > probably silly solutions, but that's the ones I'm thinking now) : > - a DB migration for creating the inventories and allocations before > upgrading (ie. not asking the computes to register themselves to the > placement API). That would be terrible because it's a data upgrade, I > know... > - having the scheduler having a backwards compatible behaviour in [1], > ie. trying to call the Placement API for getting the list of RPs or > failback to calling all the ComputeNodes if that's not possible. But > that would mean that the Placement API is still optional for Ocata :/ > - merging the scheduler calling the Placement API [1] in a point > release after we deliver Ocata (and still make the Placement API > mandatory for Ocata) so that we would be sure that all computes are > reporting their status to the Placement once we restart the scheduler in > the point release. >
FWIW, a possible other solution has been discussed upstream in the #openstack-nova channel and proposed by Dan Smith : we could remove the try-once behaviour made in the decorator, backport it to Newton and do a point release which would allow the compute nodes to try to reconcile with the Placement API in a self-heal manner. That would mean that deployers would have to upgrade to the latest Newton point release before upgrading to Ocata, which is I think the best supported model. I'll propose a patch for that in my series as a bottom change for [1]. -Sylvain > > Thoughts ? > -Sylvain > > > [1] https://review.openstack.org/#/c/417961/ > > [2] > https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67 > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev