On Wed, 2 May 2018 17:45:37 -0500, Matt Riedemann wrote:
On 5/2/2018 5:39 PM, Jay Pipes wrote:
My personal preference is to add less technical debt and go with a
solution that checks if image traits have changed in nova-api and if so,
simply refuse to perform a rebuild.

So, what if when I created my server, the image I used, let's say
image1, had required trait A and that fit the host.

Then some external service removes (or somehow changes) trait A from the
compute node resource provider (because people can and will do this,
there are a few vmware specs up that rely on being able to manage traits
out of band from nova), and then I rebuild my server with image2 that
has required trait A. That would match the original trait A in image1
and we'd say, "yup, lgtm!" and do the rebuild even though the compute
node resource provider wouldn't have trait A anymore.

Having said that, it could technically happen before traits if the
operator changed something on the underlying compute host which
invalidated instances running on that host, but I'd think if that
happened the operator would be migrating everything off the host and
disabling it from scheduling before making whatever that kind of change
would be, let's say they change the hypervisor or something less drastic
but still image property invalidating.

This is a scenario I was thinking about too. In the land of software licenses, this would be analogous to removing a license from a compute host, say. The instance is already there but should we let a rebuild proceed that is going to violate the image traits currently supported by that host? Do we potentially prolong the life of that instance by letting it be re-imaged?

I'm late to this thread but I finally went through the replies and my thought is, we should do a pre-flight check to verify with placement whether the image traits requested are 1) supported by the compute host the instance is residing on and 2) coincide with the already-existing allocations. Instead of making an assumption based on "last image" vs "new image" and artificially limiting a rebuild that should be valid to go ahead. I can imagine scenarios where a user is trying to do a rebuild that their cloud admin says should be perfectly valid on their hypervisor, but it's getting rejected because old image traits != new image traits. It seems like unnecessary user and admin pain.

It doesn't seem correct to reject the request if the current compute host can fulfill it, and if I understood correctly, we have placement APIs we can call from the conductor to verify the image traits requested for the rebuild can be fulfilled. Is there a reason not to do that?

-melanie





__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to