No On Fri, Jun 26, 2015 at 10:15 AM, Joe Gordon <joe.gord...@gmail.com> wrote:
> > > On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon <joe.gord...@gmail.com> > wrote: > >> >> >> On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague <s...@dague.net> wrote: >> >>> On 06/24/2015 01:41 PM, Russell Bryant wrote: >>> > On 06/24/2015 01:31 PM, Joe Gordon wrote: >>> >> >>> >> >>> >> On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <s...@dague.net >>> >> <mailto:s...@dague.net>> wrote: >>> >> >>> >> Back when Nova first wanted to test partial upgrade, we did a >>> bunch of >>> >> slightly odd conditionals inside of grenade and devstack to make >>> it so >>> >> that if you were very careful, you could just not stop some of >>> the old >>> >> services on a single node, upgrade everything else, and as long >>> as the >>> >> old services didn't stop, they'd be running cached code in >>> memory, and >>> >> it would look a bit like a 2 node worker not upgraded model. It >>> worked, >>> >> but it was weird. >>> >> >>> >> There has been some interest by the Nova team to expand what's >>> not being >>> >> touched, as well as the Neutron team to add partial upgrade >>> testing >>> >> support. Both are great initiatives, but I think going about it >>> the old >>> >> way is going to add a lot of complexity in weird places, and not >>> be as >>> >> good of a test as we really want. >>> >> >>> >> Nodepool now supports allocating multiple nodes. We have a >>> multinode job >>> >> in Nova regularly testing live migration using this. >>> >> >>> >> If we slice this problem differently, I think we get a better >>> >> architecture, a much easier way to add new configs, and a much >>> more >>> >> realistic end test. >>> >> >>> >> Conceptually, use devstack-gate multinode support to set up 2 >>> nodes, an >>> >> all in one, and a worker. Let grenade upgrade the all in one, >>> leave the >>> >> worker alone. >>> >> >>> >> I think the only complexity here is the fact that grenade.sh >>> implicitly >>> >> drives stack.sh. Which means one of: >>> >> >>> >> 1) devstack-gate could build the worker first, then run grenade.sh >>> >> >>> >> 2) we make it so grenade.sh can execute in parts more easily, so >>> it can >>> >> hand something else running stack.sh for it.' >>> >> >>> >> 3) we make grenade understand the subnode for partial upgrade, so >>> it >>> >> will run the stack phase on the subnode itself (given >>> credentials). >>> >> >>> >> This kind of approach means deciding which services you don't >>> want to >>> >> upgrade doesn't require devstack changes, it's just a change of >>> the >>> >> services on the worker. >>> >> >>> >> We need a volunteer for taking this on, but I think all the >>> follow on >>> >> partial upgrade support will be much much easier to do after we >>> have >>> >> this kind of mechanism in place. >>> >> >>> >> >>> >> I think this is a great approach for the future of partial upgrade >>> >> support in grenade. I would like to point out step 0 here, is to get >>> >> tempest passing consistently in multinode. >>> >> >>> >> Currently the neutron job is failing consistently, and nova-network >>> >> fails roughly 10% of the time due >>> >> to https://bugs.launchpad.net/nova/+bug/1462305 >>> >> and https://bugs.launchpad.net/nova/+bug/1445569 >>> > >>> > If multi-node isn't reliable more generally yet, do you think the >>> > simpler implementation of partial-upgrade testing could proceed? I've >>> > already done all of the patches to do it for Neutron. That way we >>> could >>> > quickly get something in place to help block regressions and work on >>> the >>> > longer-term multinode refactoring without as much time pressure. >>> >>> The thing is, these partial service bits are sneaker than one realizes >>> over time. There have been all kinds of edge conditions that crept up on >>> the n-cpu one that are really subtle because code is running in memory >>> on stale versions of dependencies which are no longer on disk. And the >>> number of people that have this model in their head is basically down to >>> a SPOF. >>> >> >> I agree, As the author of the current multinode job it is definitely a >> ugly hack (but one that has worked surprisingly well until now). >> >> >>> >>> The fact that neutron-grenade is at a 40% fail rate right now (and has >>> been for over a week) is not preventing anyone from just rechecking to >>> get past it. So I think assuming additional failing grenade tests are >>> going to keep folks from landing bugs is probably not a good assumption. >>> Making the whole path more complicated for other people to debug is an >>> explosion waiting to happen. >>> >>> So I do want to take a hard line on doing this right, because the debt >>> here is higher than you might think. The partial code was always very >>> conceptually fragile, and fails in really funny ways some times, because >>> of the fact that old is not isolated from new in a way that would be >>> expected. >>> >> >> Assuming the smoke jobs work, I don't think making grenade do mulitnode >> should take very long. In which case we get a much more realistic upgrade >> situation. >> >> > > Good news, it looks like both smoke jobs are working (ignoring failures > from https://review.openstack.org/#/c/195748/). > So next step is to teach grenade to do multinode. > > >> >>> I -1ed the n-net partial upgrade changes for the same reason. >>> >>> -Sean >>> >>> -- >>> Sean Dague >>> http://dague.net >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev