On Sun, 21 Feb 2016, Jay Pipes wrote:
I don't see how the shared-state scheduler is getting the most accurate resource view. It is only in extreme circumstances that the resource-provider scheduler's view of the resources in a system (all of which is stored without caching in the database) would differ from the "actual" inventory on a compute node.
I'm pretty sure this ¶ is central to the whole discussion. It's a question of where the final truth lies and what that positioning allows and forbids. In resource-providers, the truth, or at least the truth that is acted upon is in the database. In shared-state, the scheduler mirrors the resources. People have biases about that sort of stuff. Generalizing quite bit: All that mirroring costs quite a bit in communication terms and can go funky if the communication goes awry. But it does mean that the compute nodes are authoritative about themselves and have the possibility of using/claiming/placing resources that are not under control of the scheduler (or even nova in general). Centralizing things in the DB cuts way back on messaging and appears to provide both a computationally and conceptually efficient way of calculating placement but it does so at the cost of the compute nodes have less flexibility about managing their own resources, unless we want the failure mode you describe elsewhere to be more common than you implied. I heard somewhere, but this may be wrong or out of date, that one of the constraints with compute-nodes is that it should be possible to spawn VMs on them that are not managed by nova. If, in the full blown version of the resource-provider-based scheduler, we are not sending resource usage updates on compute-node state changes to the scheduler db and only on failure, retry rate goes up in a heterogeneous environment. That could well be fine, a price you pay, but I wonder if it is a concern? I could get into some noodling here about the artifact world versus the real world, but that's probably belaboring the point. I'm not trying to diss or support either approach, just flesh out some of the gaps in at least my understanding.
b) SimplicityGoes to the above point about debuggability, but I've always tried to follow the mantra that the best software design is not when you've added the last piece to it, but rather when you've removed the last piece from it and still have a functioning and performant system. Having a scheduler that can tackle the process of tracking resources, deciding on placement, and claiming those resources instead of playing an intricate dance of keeping state caches valid will, IMHO, lead to a better scheduler.
I think it is moving in the right direction. Removing the dance of keeping state caches valid will be a big improvement. Better still would be removing the duplication and persistence of information that already exists on the compute nodes. That would be really cool, but doesn't yet seem possible with the way we do messaging nor with the way we track shared resources (resource-pools ought to help). -- Chris Dent (╯°□°)╯︵┻━┻ http://anticdent.org/ freenode: cdent tw: @anticdent
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev