On 10/16/2017 11:22 AM, Matt Riedemann wrote:
This is interesting from the user point of view:
https://bugs.launchpad.net/nova/+bug/1723880
- The user creates an instance in a non-default AZ.
- They shelve offload the instance.
- The admin deletes the AZ that the instance was using, for whatever
reason.
- The user unshelves the instance which goes back through scheduling and
fails with NoValidHost because the AZ on the original request spec no
longer exists.
Now the question is what, if anything, do we do about this bug? Some notes:
1. How reasonable is it for a user to expect in a stable production
environment that AZs are going to be deleted from under them? We
actually have a spec related to this but with AZ renames:
https://review.openstack.org/#/c/446446/
I don't think it's reasonable for a user to expect an AZ suddenly gets
*deleted* from under them, no.
That said, I think it's reasonable for operators to want to *rename* an
AZ. And because AZs in Nova aren't really *things* [1], attempting to
change the name of an AZ involves a bunch of nasty DB updates (including
shadow tables). [2]
2. Should we null out the instance.availability_zone when it's shelved
offloaded like we do for the instance.host and instance.node attributes?
Similarly, we would not take into account the
RequestSpec.availability_zone when scheduling during unshelve. I tend to
prefer this option because once you unshelve offload an instance, it's
no longer associated with a host and therefore no longer associated with
an AZ. However, is it reasonable to assume that the user doesn't care
that the instance, once unshelved, is no longer in the originally
requested AZ? Probably not a safe assumption.
Yeah, I don't think this is appropriate.
3. When a user unshelves, they can't propose a new AZ (and I don't think
we want to add that capability to the unshelve API). So if the original
AZ is gone, should we automatically remove the
RequestSpec.availability_zone when scheduling? I tend to not like this
as it's very implicit and the user could see the AZ on their instance
change before and after unshelve and be confused.
I don't think this is something we should add to the public API (for
reasons Matt stated in a followup email to Dean). Instead, I think the
"rename AZ" functionality should do the needful DB-related tasks to
change the instance.availability_zone for shelved instances to the new
AZ name...
4. We could simply do nothing about this specific bug and assert the
behavior is correct. The user requested an instance in a specific AZ,
shelved that instance and when they wanted to unshelve it, it's no
longer available so it fails. The user would have to delete the instance
and create a new instance from the shelve snapshot image in a new AZ. If
we implemented Sylvain's spec in #1 above, maybe we don't have this
problem going forward since you couldn't remove/delete an AZ when there
are even shelved offloaded instances still tied to it.
I think it's reasonable to prevent deletion of an AZ (whatever that
actually means... see [1]) when the AZ "has instances in it" (whatever
that means... see [1])
Best,
-jay
Other options?
[1] AZs in Nova are just metadata key/values on aggregates and string
values in the instance.availability_zone DB table field that have no FK
relationship to said metadata key/values
[2] Note that, as I've said before, the entire concept of an
availability zone in Nova/Cinder/Neutron is completely fictional and
improperly pretending to be an AWS EC2 availability zone. AZs in Nova
pretend to be failure domains. They are not anything of the sort.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev