On Wed, Oct 31, 2012 at 12:21 AM, Jonathan Proulx <j...@csail.mit.edu> wrote: > Hi All, > > I'm having what I consider serious issues with teh scheduler in > Folsom. It seems to relate to the introdution of threading in the > scheduler. How many scheduler instances do you have? > > For a number of local reason we prefer to have instances start on the > compute node with the least amount of free RAM that is still enough to > satisfy the request which is the reverse of the default policy of > scheduling on the system with the most free RAM. I'm fairly certain > the smae behavior would be seen with that policy as well, and any > other policy that results in a "best" choice for scheduling the next > instance. > > We have work loads that start hundreds of instances or the same image > and there are plans on scaling this to thousands. What I'm seeing is > somehting like this: > > * user submits API request for 300 instances > * scheduler puts them all on one node > * retry schedule kicks in at some point for the 276 that don't fit > * those 276 are all scheduled on the next "best" node > * retry cycle repeats with the 252 that don't fit there > > I'm not clear exactly where the RetryScheduler in serts itself (I > should probably read it) but the first compute node is very overloaded > handling start up request which results in a fair number of instances > entering "ERROR" state rather than rescheduling (so not all 276 > actually make it to the next round) and the whole process it painfully > slow. In the end we are lucky to see 50% of the requested instances > actually make it into Active state (and then only becasue we increased > scheduler_max_attempts). > > Is that really how it's supposed to work? With the introduction of > the RetryScheduler as a fix for the scheduling race condition I think > it is, but it is a pretty bad solution for me, unless I'm missing > something, am I? wouln't be the first time... > > For now I'm working around this by using the ChanceScheduler > (compute_scheduler_driver=nova.scheduler.chance.ChanceScheduler) so > the scheduler threads don't pick a "best" node. This is orders of > magnitude faster and consistantly successful in my tests. It is not > ideal for us as we have a small minority of ciompute nodes with twice > the memory capacity of our standard nodes and would prefer to keep > those available for some of our extra large memory flavors and we'd > also liek to minimize memory fragmentation on the standard sized nodes > for similar reasons. > > -Jon > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp
-- Regards Huang Zhiteng _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp