I do intend to respond to all the excellent discussion on this thread, but right now I just want to offer an update on the code:
I've split the effort apart into multiple changes starting at [1]. A few of these are ready for review. One opinion was that a specless blueprint would be appropriate. If there's consensus on this, I'll spin one up. [1] https://review.openstack.org/#/c/615606/ On 11/5/18 03:16, Belmiro Moreira wrote: > Thanks Eric for the patch. > This will help keeping placement calls under control. > > Belmiro > > > On Sun, Nov 4, 2018 at 1:01 PM Jay Pipes <jaypi...@gmail.com > <mailto:jaypi...@gmail.com>> wrote: > > On 11/02/2018 03:22 PM, Eric Fried wrote: > > All- > > > > Based on a (long) discussion yesterday [1] I have put up a patch [2] > > whereby you can set [compute]resource_provider_association_refresh to > > zero and the resource tracker will never* refresh the report client's > > provider cache. Philosophically, we're removing the "healing" > aspect of > > the resource tracker's periodic and trusting that placement won't > > diverge from whatever's in our cache. (If it does, it's because the op > > hit the CLI, in which case they should SIGHUP - see below.) > > > > *except: > > - When we initially create the compute node record and bootstrap its > > resource provider. > > - When the virt driver's update_provider_tree makes a change, > > update_from_provider_tree reflects them in the cache as well as > pushing > > them back to placement. > > - If update_from_provider_tree fails, the cache is cleared and gets > > rebuilt on the next periodic. > > - If you send SIGHUP to the compute process, the cache is cleared. > > > > This should dramatically reduce the number of calls to placement from > > the compute service. Like, to nearly zero, unless something is > actually > > changing. > > > > Can I get some initial feedback as to whether this is worth > polishing up > > into something real? (It will probably need a bp/spec if so.) > > > > [1] > > > > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03 > > [2] https://review.openstack.org/#/c/614886/ > > > > ========== > > Background > > ========== > > In the Queens release, our friends at CERN noticed a serious spike in > > the number of requests to placement from compute nodes, even in a > > stable-state cloud. Given that we were in the process of adding a > ton of > > infrastructure to support sharing and nested providers, this was not > > unexpected. Roughly, what was previously: > > > > @periodic_task: > > GET /resource_providers/$compute_uuid > > GET /resource_providers/$compute_uuid/inventories > > > > became more like: > > > > @periodic_task: > > # In Queens/Rocky, this would still just return the compute RP > > GET /resource_providers?in_tree=$compute_uuid > > # In Queens/Rocky, this would return nothing > > GET /resource_providers?member_of=...&required=MISC_SHARES... > > for each provider returned above: # i.e. just one in Q/R > > GET /resource_providers/$compute_uuid/inventories > > GET /resource_providers/$compute_uuid/traits > > GET /resource_providers/$compute_uuid/aggregates > > > > In a cloud the size of CERN's, the load wasn't acceptable. But at the > > time, CERN worked around the problem by disabling refreshing entirely. > > (The fact that this seems to have worked for them is an > encouraging sign > > for the proposed code change.) > > > > We're not actually making use of most of that information, but it sets > > the stage for things that we're working on in Stein and beyond, like > > multiple VGPU types, bandwidth resource providers, accelerators, NUMA, > > etc., so removing/reducing the amount of information we look at isn't > > really an option strategically. > > I support your idea of getting rid of the periodic refresh of the cache > in the scheduler report client. Much of that was added in order to > emulate the original way the resource tracker worked. > > Most of the behaviour in the original resource tracker (and some of the > code still in there for dealing with (surprise!) PCI passthrough > devices > and NUMA topology) was due to doing allocations on the compute node > (the > whole claims stuff). We needed to always be syncing the state of the > compute_nodes and pci_devices table in the cell database with whatever > usage information was being created/modified on the compute nodes [0]. > > All of the "healing" code that's in the resource tracker was basically > to deal with "soft delete", migrations that didn't complete or work > properly, and, again, to handle allocations becoming out-of-sync > because > the compute nodes were responsible for allocating (as opposed to the > current situation we have where the placement service -- via the > scheduler's call to claim_resources() -- is responsible for allocating > resources [1]). > > Now that we have generation markers protecting both providers and > consumers, we can rely on those generations to signal to the scheduler > report client that it needs to pull fresh information about a provider > or consumer. So, there's really no need to automatically and blindly > refresh any more. > > Best, > -jay > > [0] We always need to be syncing those tables because those tables, > unlike the placement database's data modeling, couple both inventory > AND > usage in the same table structure... > > [1] again, except for PCI devices and NUMA topology, because of the > tight coupling in place with the different resource trackers those > types > of resources use... > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev