Re: [openstack-dev] [nova] Interesting bug when unshelving an instance in an AZ and the AZ is gone

Jay Pipes Sun, 22 Oct 2017 05:03:58 -0700

On 10/16/2017 11:22 AM, Matt Riedemann wrote:

This is interesting from the user point of view:
https://bugs.launchpad.net/nova/+bug/1723880

- The user creates an instance in a non-default AZ.
- They shelve offload the instance.
- The admin deletes the AZ that the instance was using, for whateverreason.- The user unshelves the instance which goes back through scheduling andfails with NoValidHost because the AZ on the original request spec nolonger exists.
Now the question is what, if anything, do we do about this bug? Some notes:
1. How reasonable is it for a user to expect in a stable productionenvironment that AZs are going to be deleted from under them? Weactually have a spec related to this but with AZ renames:
https://review.openstack.org/#/c/446446/

I don't think it's reasonable for a user to expect an AZ suddenly gets*deleted* from under them, no.

That said, I think it's reasonable for operators to want to *rename* anAZ. And because AZs in Nova aren't really *things* [1], attempting tochange the name of an AZ involves a bunch of nasty DB updates (includingshadow tables). [2]

2. Should we null out the instance.availability_zone when it's shelvedoffloaded like we do for the instance.host and instance.node attributes?Similarly, we would not take into account theRequestSpec.availability_zone when scheduling during unshelve. I tend toprefer this option because once you unshelve offload an instance, it'sno longer associated with a host and therefore no longer associated withan AZ. However, is it reasonable to assume that the user doesn't carethat the instance, once unshelved, is no longer in the originallyrequested AZ? Probably not a safe assumption.


Yeah, I don't think this is appropriate.

3. When a user unshelves, they can't propose a new AZ (and I don't thinkwe want to add that capability to the unshelve API). So if the originalAZ is gone, should we automatically remove theRequestSpec.availability_zone when scheduling? I tend to not like thisas it's very implicit and the user could see the AZ on their instancechange before and after unshelve and be confused.

I don't think this is something we should add to the public API (forreasons Matt stated in a followup email to Dean). Instead, I think the"rename AZ" functionality should do the needful DB-related tasks tochange the instance.availability_zone for shelved instances to the newAZ name...

4. We could simply do nothing about this specific bug and assert thebehavior is correct. The user requested an instance in a specific AZ,shelved that instance and when they wanted to unshelve it, it's nolonger available so it fails. The user would have to delete the instanceand create a new instance from the shelve snapshot image in a new AZ. Ifwe implemented Sylvain's spec in #1 above, maybe we don't have thisproblem going forward since you couldn't remove/delete an AZ when thereare even shelved offloaded instances still tied to it.

I think it's reasonable to prevent deletion of an AZ (whatever thatactually means... see [1]) when the AZ "has instances in it" (whateverthat means... see [1])


Best,
-jay

Other options?

[1] AZs in Nova are just metadata key/values on aggregates and stringvalues in the instance.availability_zone DB table field that have no FKrelationship to said metadata key/values

[2] Note that, as I've said before, the entire concept of anavailability zone in Nova/Cinder/Neutron is completely fictional andimproperly pretending to be an AWS EC2 availability zone. AZs in Novapretend to be failure domains. They are not anything of the sort.


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Interesting bug when unshelving an instance in an AZ and the AZ is gone

Reply via email to