Hi All,

When I was looking at bugs related to race conditions of scheduler [1-3], it 
feels like nova scheduler lacks sanity checks of schedule decisions according 
to different situations. We cannot even make sure that some fixes successfully 
mitigate race conditions to an acceptable scale. For example, there is no easy 
way to test whether server-group race conditions still exists after a fix for 
bug[1], or to make sure that after scheduling there will be no violations of 
allocation ratios reported by bug[2], or to test that the retry rate is 
acceptable in various corner cases proposed by bug[3]. And there will be much 
more in this list.

So I'm asking whether there is a plan to add those tests in the future, or is 
there a design exist to simplify writing and executing those kinds of tests? 
I'm thinking of using fake databases and fake interfaces to isolate the entire 
scheduler service, so that we can easily build up a disposable environment with 
all kinds of fake resources and fake compute nodes to test scheduler behaviors. 
It is even a good way to test whether scheduler is capable to scale to 10k 
nodes without setting up 10k real compute nodes.

I'm also interested in the bp[4] to reduce scheduler race conditions in 
green-thread level. I think it is a good start point in solving the huge racing 
problem of nova scheduler, and I really wish I could help on that.


[1] https://bugs.launchpad.net/nova/+bug/1423648
[2] https://bugs.launchpad.net/nova/+bug/1370207
[3] https://bugs.launchpad.net/nova/+bug/1341420
[4] https://blueprints.launchpad.net/nova/+spec/host-state-level-locking


Regards,
-Yingxin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to