Hi Folks, I was reviewing a code change to add generic retries for build failures ( https://review.openstack.org/#/c/9540/2 ), and wanted to be sure that it wouldn't invalidate the capacity accounting used by the scheduler.
However I've been sitting here for a while working through the Folsom scheduler code trying to understand how the capacity based scheduling now works, and I'm sure I'm missing something obvious but I just can't work out where the free_ram_mb value in the compute_node table gets updated. I can see the database api method to update the values, compute_node_utilization_update(), it doesn't look as if anything in the code ever calls that ? >From when I last looked at this / various discussions here and at the design >summits I thought the approach was that: - The scheduler would make a call (rather than a cast) to the compute manger, which would then do some verification work, update the DB table whilst in the context of that call, and then start a thread to complete the spawn. The need to go all the way to the compute node as a call was to avoid race conditions from multiple schedulers. (the change I'm looking at is part of a blueprint to avoid such a race, so maybe I imagined the change from cast to call ?) - On a delete, the capacity_notifer (which had to be configured into the list_notifier) would detect the delete message, and decrement the database values. But now I look through the code it looks as if the scheduler is still doing a cast (scheduler/driver), and although I can see the database api call to update the values, compute_node_utilization_update(), it doesn't look as if anything in the code ever calls that ? The ram_filter scheduler seems to use the free_ram_mb value, and that value seems to come from the host_manager in the scheduler which is read from the Database, but I can't for the life of me work out where these values are updated in the Database. The capacity_notifier, which used to decrement values on a VM deletion only (according to the comments the increment was done in the scheduler) seems to have now disappeared altogether in the move of the notifier to openstack/common ? So I'm sure I'm missing some other even more cunning plan on how to keep the values current, but I can't for the life of me work out what it is - can someone fill me in please ? Thanks, Phil
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp