On Wed, 2 May 2018 17:45:37 -0500, Matt Riedemann wrote:
On 5/2/2018 5:39 PM, Jay Pipes wrote:
My personal preference is to add less technical debt and go with a
solution that checks if image traits have changed in nova-api and if so,
simply refuse to perform a rebuild.
So, what if when I created my server, the image I used, let's say
image1, had required trait A and that fit the host.
Then some external service removes (or somehow changes) trait A from the
compute node resource provider (because people can and will do this,
there are a few vmware specs up that rely on being able to manage traits
out of band from nova), and then I rebuild my server with image2 that
has required trait A. That would match the original trait A in image1
and we'd say, "yup, lgtm!" and do the rebuild even though the compute
node resource provider wouldn't have trait A anymore.
Having said that, it could technically happen before traits if the
operator changed something on the underlying compute host which
invalidated instances running on that host, but I'd think if that
happened the operator would be migrating everything off the host and
disabling it from scheduling before making whatever that kind of change
would be, let's say they change the hypervisor or something less drastic
but still image property invalidating.
This is a scenario I was thinking about too. In the land of software
licenses, this would be analogous to removing a license from a compute
host, say. The instance is already there but should we let a rebuild
proceed that is going to violate the image traits currently supported by
that host? Do we potentially prolong the life of that instance by
letting it be re-imaged?
I'm late to this thread but I finally went through the replies and my
thought is, we should do a pre-flight check to verify with placement
whether the image traits requested are 1) supported by the compute host
the instance is residing on and 2) coincide with the already-existing
allocations. Instead of making an assumption based on "last image" vs
"new image" and artificially limiting a rebuild that should be valid to
go ahead. I can imagine scenarios where a user is trying to do a rebuild
that their cloud admin says should be perfectly valid on their
hypervisor, but it's getting rejected because old image traits != new
image traits. It seems like unnecessary user and admin pain.
It doesn't seem correct to reject the request if the current compute
host can fulfill it, and if I understood correctly, we have placement
APIs we can call from the conductor to verify the image traits requested
for the rebuild can be fulfilled. Is there a reason not to do that?
-melanie
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev