Hi Folks,

I was reviewing a code change to add generic retries for build failures ( 
https://review.openstack.org/#/c/9540/2 ), and wanted to be sure that it 
wouldn't invalidate the capacity accounting used by the scheduler.


However I've been sitting here for a while working through the Folsom scheduler 
code trying to understand how the capacity based scheduling now works, and I'm 
sure I'm missing something obvious but I just can't work out where the 
free_ram_mb value in the compute_node table gets updated.



I can see the database api method to update the values, 
compute_node_utilization_update(),  it doesn't look as if anything in the code 
ever calls that ?



>From when I last looked at this / various discussions here and at the design 
>summits I thought the approach was that:

-          The scheduler would make a call (rather than a cast) to the compute 
manger, which would then do some verification work, update the DB table whilst 
in the context of that call, and then start a thread to complete the spawn.  
The need to go all the way to the compute node as a call was to avoid race 
conditions from multiple schedulers.  (the change I'm looking at is part of a 
blueprint to avoid such a race, so maybe I imagined the change from cast to 
call ?)



-          On a delete, the capacity_notifer (which had to be configured into 
the list_notifier) would detect the delete message, and decrement the database 
values.



But now I look through the code it looks as if the scheduler is still doing a 
cast (scheduler/driver),  and although I can see the database api call to 
update the values, compute_node_utilization_update(),  it doesn't look as if 
anything in the code ever calls that ?



The ram_filter scheduler seems to use the free_ram_mb value, and that value 
seems to come from the host_manager in the scheduler which is read from the 
Database,  but I can't for the life of me work out where these values are 
updated in the Database.



The capacity_notifier, which used to decrement values on a VM deletion only 
(according to the comments the increment was done in the scheduler) seems to 
have now disappeared altogether in the move of the notifier to openstack/common 
?



So I'm sure I'm missing some other even more cunning plan on how to keep the 
values current, but I can't for the life of me work out what it is - can 
someone fill me in please ?



Thanks,

Phil

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to