It is easy to understand that scheduling in nova-scheduler service consists of 2 major phases: A. Cache refresh, in code [1]. B. Filtering and weighing, in code [2].
Couple of previous experiments [3] [4] shows that “cache-refresh” is the major bottleneck of nova scheduler. For example, the 15th page of presentation [3] says the time cost of “cache-refresh” takes 98.5% of time of the entire `_schedule` function [6], when there are 200-1000 nodes and 50+ concurrent requests. The latest experiments [5] in China Mobile’s 1000-node environment also prove the same conclusion, and it’s even 99.7% when there’re 40+ concurrent requests. Here’re some existing solutions for the “cache-refresh” bottleneck: I. Caching scheduler. II. Scheduler filters in DB [7]. III. Eventually consistent scheduler host state [8]. I can discuss their merits and drawbacks in a separate thread, but here I want to show a simplest solution based on my findings during the experiments [5]. I wrapped the expensive function [1] and tried to see the behavior of cache-refresh under pressure. It is very interesting to see a single cache-refresh only costs about 0.3 seconds. And when there’re concurrent cache-refresh operations, this cost can be suddenly increased to 8 seconds. I’ve seen it even reached 60 seconds for one cache-refresh under higher pressure. See the below section for details. It raises a question in the current implementation: Do we really need a cache-refresh operation [1] for *every* requests? If those concurrent operations are replaced by one database query, the scheduler is still happy with the latest resource view from database. Scheduler is even happier because those expensive cache-refresh operations are minimized and much faster (0.3 seconds). I believe it is the simplest optimization to scheduler performance, which doesn’t make any changes in filter scheduler. Minor improvements inside host manager is enough. [1] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L104 [2] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L112-L123 [3] https://www.openstack.org/assets/presentation-media/7129-Dive-into-nova-scheduler-performance-summit.pdf [4] http://lists.openstack.org/pipermail/openstack-dev/2016-June/098202.html [5] Please refer to Barcelona summit session ID 15334 later: “A tool to test and tune your OpenStack Cloud? Sharing our 1000 node China Mobile experience.” [6] https://github.com/openstack/nova/blob/master/nova/scheduler/filter_scheduler.py#L53 [7] https://review.openstack.org/#/c/300178/ [8] https://review.openstack.org/#/c/306844/ ****** Here is the discovery from latest experiments [5] ****** https://docs.google.com/document/d/1N_ZENg-jmFabyE0kLMBgIjBGXfL517QftX3DW7RVCzU/edit?usp=sharing The figure 1 illustrates the concurrent cache-refresh operations in a nova scheduler service. There’re at most 23 requests waiting for the cache-refresh operations at time 43s. The figure 2 illustrates the time cost of every requests in the same experiment. It shows that the cost is increased with the growth of concurrency. It proves the vicious circle that a request will wait longer for the database when there’re more waiting requests. The figure 3/4 illustrate a worse case when the cache-refresh operation costs reach 60 seconds because of excessive cache-refresh operations. -- Regards Yingxin __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev