On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague <s...@dague.net> wrote: > On 06/24/2015 01:41 PM, Russell Bryant wrote: > > On 06/24/2015 01:31 PM, Joe Gordon wrote: > >> > >> > >> On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <s...@dague.net > >> <mailto:s...@dague.net>> wrote: > >> > >> Back when Nova first wanted to test partial upgrade, we did a bunch > of > >> slightly odd conditionals inside of grenade and devstack to make it > so > >> that if you were very careful, you could just not stop some of the > old > >> services on a single node, upgrade everything else, and as long as > the > >> old services didn't stop, they'd be running cached code in memory, > and > >> it would look a bit like a 2 node worker not upgraded model. It > worked, > >> but it was weird. > >> > >> There has been some interest by the Nova team to expand what's not > being > >> touched, as well as the Neutron team to add partial upgrade testing > >> support. Both are great initiatives, but I think going about it the > old > >> way is going to add a lot of complexity in weird places, and not be > as > >> good of a test as we really want. > >> > >> Nodepool now supports allocating multiple nodes. We have a > multinode job > >> in Nova regularly testing live migration using this. > >> > >> If we slice this problem differently, I think we get a better > >> architecture, a much easier way to add new configs, and a much more > >> realistic end test. > >> > >> Conceptually, use devstack-gate multinode support to set up 2 > nodes, an > >> all in one, and a worker. Let grenade upgrade the all in one, leave > the > >> worker alone. > >> > >> I think the only complexity here is the fact that grenade.sh > implicitly > >> drives stack.sh. Which means one of: > >> > >> 1) devstack-gate could build the worker first, then run grenade.sh > >> > >> 2) we make it so grenade.sh can execute in parts more easily, so it > can > >> hand something else running stack.sh for it.' > >> > >> 3) we make grenade understand the subnode for partial upgrade, so it > >> will run the stack phase on the subnode itself (given credentials). > >> > >> This kind of approach means deciding which services you don't want > to > >> upgrade doesn't require devstack changes, it's just a change of the > >> services on the worker. > >> > >> We need a volunteer for taking this on, but I think all the follow > on > >> partial upgrade support will be much much easier to do after we have > >> this kind of mechanism in place. > >> > >> > >> I think this is a great approach for the future of partial upgrade > >> support in grenade. I would like to point out step 0 here, is to get > >> tempest passing consistently in multinode. > >> > >> Currently the neutron job is failing consistently, and nova-network > >> fails roughly 10% of the time due > >> to https://bugs.launchpad.net/nova/+bug/1462305 > >> and https://bugs.launchpad.net/nova/+bug/1445569 > > > > If multi-node isn't reliable more generally yet, do you think the > > simpler implementation of partial-upgrade testing could proceed? I've > > already done all of the patches to do it for Neutron. That way we could > > quickly get something in place to help block regressions and work on the > > longer-term multinode refactoring without as much time pressure. > > The thing is, these partial service bits are sneaker than one realizes > over time. There have been all kinds of edge conditions that crept up on > the n-cpu one that are really subtle because code is running in memory > on stale versions of dependencies which are no longer on disk. And the > number of people that have this model in their head is basically down to > a SPOF. >
I agree, As the author of the current multinode job it is definitely a ugly hack (but one that has worked surprisingly well until now). > > The fact that neutron-grenade is at a 40% fail rate right now (and has > been for over a week) is not preventing anyone from just rechecking to > get past it. So I think assuming additional failing grenade tests are > going to keep folks from landing bugs is probably not a good assumption. > Making the whole path more complicated for other people to debug is an > explosion waiting to happen. > > So I do want to take a hard line on doing this right, because the debt > here is higher than you might think. The partial code was always very > conceptually fragile, and fails in really funny ways some times, because > of the fact that old is not isolated from new in a way that would be > expected. > Assuming the smoke jobs work, I don't think making grenade do mulitnode should take very long. In which case we get a much more realistic upgrade situation. > > I -1ed the n-net partial upgrade changes for the same reason. > > -Sean > > -- > Sean Dague > http://dague.net > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev