Re: [Openstack-operators] Maintenance

Jay Pipes Fri, 22 Apr 2016 12:51:07 -0700

On 04/14/2016 05:14 AM, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
<snip>

As admin I want to know when host is ready to actions to be done by admin
during the maintenance. Meaning physical resources are emptied.

You are equating "host maintenance mode" with the end result of a callto `nova host-evacuate-live`. The two are not the same.

"host maintenance mode" typically just refers to taking a Nova computenode out of consideration for placing new workloads on that computenode. Putting a Nova compute node into host maintenance mode is assimple as calling `nova service-disable $hostname nova-compute`.

Depending on what you need to perform on the compute node that is inhost maintenance mode, you *may* want to migrate the workloads from thatcompute node to some other compute node that isn't in host maintenancemode. The `nova host-evacuate $hostname` and `nova host-evacuate-live$hostname` commands in the Nova CLI [1] can be used to migrate orlive-migrate all workloads off the target compute node.

Live migration will reduce the disruption that tenant workloads (dataplane) experience during the workload migration. However, research atMirantis has shown that libvirt/KVM/QEMU live migration performedagainst workloads with even a medium rate of memory page dirtying caneasily never complete. Solutions like auto-converge and xbzrlecompression have minimal effect on this, unfortunately. Pausing aworkload manually is typically what is done to force the live migrationto complete.

[1] Note that these are commands in the Nova CLI tool(python-novaclient). Neither a host-evacuate nor a host-evacuate-liveREST API call exists in the Compute API. This fact alone should suggestto folks that the appropriate place to put logic associated withperforming host maintenance tasks should be *outside* of Nova entirely...

As owner of a server I want to prepare for maintenance to minimize downtime,
keep capacity on needed level and switch HA service to server not
affected by maintenance.

This isn't an appropriate use case, IMHO. HA control planes should, bytheir very nature, be established across various failure domains. Thewhole *point* of having an HA service is so that you don't need to"prepare" for some maintenance event (planned or unplanned).

All HA control planes worth their salt will be able to notify someexternal listener of a partition in the cluster. This HA control planeis the responsibility of the tenant, not the infrastructure (i.e. Nova).I really do not want to add coupling between infrastructure controlplane services and tenant control plane services.

As owner of a server I want to know when my servers will be down because of
host maintenance as it might be servers are not moved to another host.

See above. As an owner of a server involved in an HA cluster, it is *theserver owner's* responsibility to set things up so that the clusterrebalances, handles redirected load, or does the custom thing that theywant. This isn't, IMHO, the domain of the NVFi but rather a muchhigher-level NFVO orchestration layer.

As owner of a server I want to know if host is to be totally removed, so
instead of keeping my servers on host during maintenance, I want to move
them to somewhere else.

This isn't something the owner of a server even knows about in a cloudenvironment. Owners of a server don't (and shouldn't) know which computenode they are, nor should they know that a host is having a planned orunplanned host maintenance event.

The infrastructure owner (cloud deployer/operator) is responsible fordoing the needful and performing a [live] migration of workloads off ofa failing host or a host that is undergoing a cold upgrade. The tenantdoesn't know anything about these things, and shouldn't.

As owner of a server I want to send acknowledgement to be ready for host
maintenance and I want to state if servers are to be moved or kept on host.

This is describing some virtual inventory management or CMDBfunctionality that isn't in scope for infrastructure services like Nova.Perhaps it's worth looking into how something like Remedy can manageyour virtual inventory in this manner, but I don't see this being in theOpenStack realm really...

FWIW, this is the same objection I had to Tacker joining the OpenStackBig Tent. It is essentially a monolithic, purpose-built-for-Telcoapplication that orchestrates VNFs at layers way above the OpenStackdeployment.


Best,
-jay

Removal and creating of server is in owner's control already. Optionally
server
Configuration data could hold information about automatic actions to be
done
when host is going down unexpectedly or in controlled manner. Also
actions at
the same if down permanently or only temporarily. Still this needs
acknowledgement from server owner as he needs time for application level
controlled HA service switchover.
Br,
Tomi


_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] Maintenance

Reply via email to