On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann <mriede...@gmail.com> wrote:
> I've written a nova-manage placement heal_allocations CLI [1] which was a > TODO from the PTG in Dublin as a step toward getting existing > CachingScheduler users to roll off that (which is deprecated). > > During the CERN cells v1 upgrade talk it was pointed out that CERN was > able to go from placement-per-cell to centralized placement in Ocata > because the nova-computes in each cell would automatically recreate the > allocations in Placement in a periodic task, but that code is gone once > you're upgraded to Pike or later. > > In various other talks during the summit this week, we've talked about > things during upgrades where, for instance, if placement is down for some > reason during an upgrade, a user deletes an instance and the allocation > doesn't get cleaned up from placement so it's going to continue counting > against resource usage on that compute node even though the server instance > in nova is gone. So this CLI could be expanded to help clean up situations > like that, e.g. provide it a specific server ID and the CLI can figure out > if it needs to clean things up in placement. > > So there are plenty of things we can build into this, but the patch is > already quite large. I expect we'll also be backporting this to stable > branches to help operators upgrade/fix allocation issues. It already has > several things listed in a code comment inline about things to build into > this later. > > My question is, is this good enough for a first iteration or is there > something severely missing before we can merge this, like the automatic > marker tracking mentioned in the code (that will probably be a non-trivial > amount of code to add). I could really use some operator feedback on this > to just take a look at what it already is capable of and if it's not going > to be useful in this iteration, let me know what's missing and I can add > that in to the patch. > > [1] https://review.openstack.org/#/c/565886/ > > It does sound for me a good way to help operators. That said, given I'm now working on using Nested Resource Providers for VGPU inventories, I wonder about a possible upgrade problem with VGPU allocations. Given that : - in Queens, VGPU inventories are for the root RP (ie. the compute node RP), but, - in Rocky, VGPU inventories will be for children RPs (ie. against a specific VGPU type), then if we have VGPU allocations in Queens, when upgrading to Rocky, we should maybe recreate the allocations to a specific other inventory ? Hope you see the problem with upgrading by creating nested RPs ? > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators