On 4/12/2017 12:30 PM, Mathieu Gagné wrote:
Thanks for starting this discussion. There is a lot to cover/answer.

On Tue, Apr 11, 2017 at 6:35 PM, Matt Riedemann <mriede...@gmail.com> wrote:

This is not discoverable at the moment, for the end user or cinder, so I'm
trying to figure out what the failure mode looks like.

This all starts on the cinder side to extend the size of the attached
volume. Cinder is going to have to see if Nova is new enough to handle this
(via the available API versions) before accepting the request and resizing
the volume. Then Cinder sends the event to Nova. This is where it gets
interesting.

On the Nova side, if all of the computes aren't new enough, we could just
fail the request outright with a 409. What does Cinder do then? Rollback the
volume resize?

This means an extend volume operation would need to check for Nova
support first.
This also means adding a new API call to fetch and discover such
capabilities per instance (from associated compute node).
If we want to catch errors in volume size extension in Nova, we will
need to find an other way, external events are async.

Today cinder can GET /versions from the compute API and tell if it should even start attempting volume extend or not for an attached volume. If the microversion support isn't there in the compute side, cinder should fail fast in the API. That's a detail for the cinder spec.

Once the request reaches nova, we could technically lookup the service version for the compute from the API and tell if it's new enough to support this capability and fail fast if it won't. I don't know if we'll do that, but we have it in our pocket. Either way, the Cinder side should handle an error response from Nova and proceed accordingly (rollback the volume extend).


But let's say the computes are new enough, but the instance is on a compute
that does not support the operation. Then what? Do we register an instance
fault and put the instance into ERROR state? Then the admin would need to
intervene.

Are there other ideas? Until we have capabilities (info) exposed out of the
API we're stuck with questions like this.


Like TommyLike mentioned in a review, AWS introduced Live Volume
Modifications available on some instance types.
On instance types with limited support, you need to stop/start the
instance or detach/attach the volume.
On instances started before a certain date, you need to stop/start the
instance or detach/attach the volume at least once.
In all cases, the end user needs to extend the partition/filesystem in
the instance.

They have the luxury to fully control the environment and synchronize
the compute service with the volume service.
Even (speculatively) having bidirectional
orchestration/synchronization/communications or what.

I have that same luxury since I only support one volume backend and
virt driver combination.
But I now start to grasp the extend of what adding such feature
requires, especially when it implies cross-services support...

Yeah it's super fun isn't it. :) This is why it takes a long time to get some features into Nova.


We have a matrix of compute drivers and volume backends to support
with some combinations which might never support online volume
extension.
There is the desire for OpenStack to be interoperable between clouds
so there is a strong incentive to make it work for all combinations.

I will still take the liberty to ask:

Would it be in the realm of possibilities for a deployer to have to
explicitly enable this feature?
A deployer would be able to enable such feature once all
services/components it choose to deployed fully support online volume
extension.

Correct, I thought about this yesterday too. And this should be a detail in the Cinder spec for sure, but Cinder should probably have a specific policy check for attempting to extend an attached volume. Having said this, I see that Cinder has a "volume:extend" policy rule but I don't see it actually checked in the code, is that a bug?

But the idea is, you, as a deployer, could allow extending volumes that are not attached (using the existing volume:extend policy) but disable the ability to extend attached volumes (maybe new rule volume:extend_attached?). Then if you're running older computes, or not running libvirt/hyperv computes, etc, then you just disable the API entrypoint for the entire operation on the Cinder side.

^ should all be captured in the Cinder spec.


I know it won't address cases where a mixed of volume backends and
virt drivers are deployed.
So we would still need capabilities discoverability. This includes
volume type capabilities discoverability which I'm not sure exists
today.

Lets not start about how Horizon will discover such capabilities per
instance/volume. That's an other can of worms. =)

--
Mathieu

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to