Re: [openstack-dev] Compute node stats sent to the scheduler

Russell Bryant Wed, 19 Jun 2013 13:59:49 -0700

On 06/17/2013 05:09 PM, Brian Elliott wrote:
> 
> On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote:
> 
>>
>> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote:
>>
>>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote:
>>>> Looking into the scheduler a bit there's an issue of duplicated effort 
>>>> that is a little puzzling.  The database table `compute_nodes' is being 
>>>> updated periodically with data about capabilities and resources used 
>>>> (memory, vcpus, ...) while at the same time a periodic RPC call is being 
>>>> made to the scheduler sending pretty much the same data.
>>>>
>>>> Does anyone know why we are updating the same data in two different place 
>>>> using two different mechanisms?  Also, assuming we were to remove one of 
>>>> these updates, which one should go?  (I thought at one point in time there 
>>>> was a goal to create a database free compute node which would imply we 
>>>> should remove the DB update.)
>>>
>>> Have you looked around to see if any code is using the data from the db?
>>>
>>> Having schedulers hit the db for the current state of all compute nodes
>>> all of the time would be a large additional db burden that I think we
>>> should avoid.  So, it makes sense to keep the rpc fanout_cast of current
>>> stats to schedulers.
>>
>> This is actually what the scheduler uses. :)   The fanout messages are too 
>> infrequent and can be too laggy.  So, the scheduler was moved to using the 
>> DB a long, long time ago… but it was very inefficient, at first, because it 
>> looped through all instances.  So we added things we needed into 
>> compute_node and compute_node_stats so we only had to look at the hosts.  
>> You have to pull the hosts anyway, so we pull the stats at the same time.
>>
>> The problem is… when we stopped using certain data from the fanout 
>> messages…. we never removed it.   We should AT LEAST do this.  But.. (see 
>> below)..
>>
>>>
>>> The scheduler also does a fanout_cast to all compute nodes when it
>>> starts up to trigger the compute nodes to populate the cache in the
>>> scheduler.  It would be nice to never fanout_cast to all compute nodes
>>> (given that there may be a *lot* of them).  We could replace this with
>>> having the scheduler populate its cache from the database.
>>
>> I think we should audit the remaining things that the scheduler uses from 
>> these messages and move them to the DB.  I believe it's limited to the 
>> hypervisor capabilities to compare against aggregates or some such.  I 
>> believe it's things that change very rarely… so an alternative can be to 
>> only send fanout messages when capabilities change!   We could always do 
>> that as a first step.
>>
>>>
>>> Removing the db usage completely would be nice if nothing is actually
>>> using it, but we'd have to look into an alternative solution for
>>> removing the scheduler fanout_cast to compute.
>>
>> Relying on anything but the DB for current memory free, etc, is just too 
>> laggy… so we need to stick with it, IMO.
>>
>> - Chris
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> As Chris said, the reason it ended up this way using the DB is to quickly get 
> up to date usage on hosts to the scheduler.  I certainly understand the point 
> that it's a whole lot of increased load on the DB, but the RPC data was quite 
> stale.  If there is interest in moving away from the DB updates, I think we 
> have to either:
> 
> 1) Send RPC updates to scheduler  on essentially every state change during a 
> build.
> 
> or
> 
> 2) Change the scheduler architecture so there is some "memory" of resources 
> consumed between requests.  The scheduler would have to remember which hosts 
> recent builds were assigned to.  This could be a bit of a data 
> synchronization problem. if you're talking about using multiple scheduler 
> instances.


Thanks for the feedback.  Neither of these sound too attractive to me.

I think Chris' comment to audit the usage of the fanout messages and get
rid of them sounds like the best way forward to clean this up.

-- 
Russell Bryant

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Compute node stats sent to the scheduler

Reply via email to