Re: [openstack-dev] Compute node stats sent to the scheduler

Sandy Walsh Wed, 19 Jun 2013 14:45:21 -0700


On 06/19/2013 05:54 PM, Russell Bryant wrote:
> On 06/17/2013 05:09 PM, Brian Elliott wrote:
>>
>> On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote:
>>
>>>
>>> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote:
>>>
>>>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote:
>>>>> Looking into the scheduler a bit there's an issue of duplicated effort 
>>>>> that is a little puzzling.  The database table `compute_nodes' is being 
>>>>> updated periodically with data about capabilities and resources used 
>>>>> (memory, vcpus, ...) while at the same time a periodic RPC call is being 
>>>>> made to the scheduler sending pretty much the same data.
>>>>>
>>>>> Does anyone know why we are updating the same data in two different place 
>>>>> using two different mechanisms?  Also, assuming we were to remove one of 
>>>>> these updates, which one should go?  (I thought at one point in time 
>>>>> there was a goal to create a database free compute node which would imply 
>>>>> we should remove the DB update.)
>>>>
>>>> Have you looked around to see if any code is using the data from the db?
>>>>
>>>> Having schedulers hit the db for the current state of all compute nodes
>>>> all of the time would be a large additional db burden that I think we
>>>> should avoid.  So, it makes sense to keep the rpc fanout_cast of current
>>>> stats to schedulers.
>>>
>>> This is actually what the scheduler uses. :)   The fanout messages are too 
>>> infrequent and can be too laggy.  So, the scheduler was moved to using the 
>>> DB a long, long time ago… but it was very inefficient, at first, because it 
>>> looped through all instances.  So we added things we needed into 
>>> compute_node and compute_node_stats so we only had to look at the hosts.  
>>> You have to pull the hosts anyway, so we pull the stats at the same time.
>>>
>>> The problem is… when we stopped using certain data from the fanout 
>>> messages…. we never removed it.   We should AT LEAST do this.  But.. (see 
>>> below)..
>>>
>>>>
>>>> The scheduler also does a fanout_cast to all compute nodes when it
>>>> starts up to trigger the compute nodes to populate the cache in the
>>>> scheduler.  It would be nice to never fanout_cast to all compute nodes
>>>> (given that there may be a *lot* of them).  We could replace this with
>>>> having the scheduler populate its cache from the database.
>>>
>>> I think we should audit the remaining things that the scheduler uses from 
>>> these messages and move them to the DB.  I believe it's limited to the 
>>> hypervisor capabilities to compare against aggregates or some such.  I 
>>> believe it's things that change very rarely… so an alternative can be to 
>>> only send fanout messages when capabilities change!   We could always do 
>>> that as a first step.
>>>
>>>>
>>>> Removing the db usage completely would be nice if nothing is actually
>>>> using it, but we'd have to look into an alternative solution for
>>>> removing the scheduler fanout_cast to compute.
>>>
>>> Relying on anything but the DB for current memory free, etc, is just too 
>>> laggy… so we need to stick with it, IMO.
>>>
>>> - Chris
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> As Chris said, the reason it ended up this way using the DB is to quickly 
>> get up to date usage on hosts to the scheduler.  I certainly understand the 
>> point that it's a whole lot of increased load on the DB, but the RPC data 
>> was quite stale.  If there is interest in moving away from the DB updates, I 
>> think we have to either:
>>
>> 1) Send RPC updates to scheduler  on essentially every state change during a 
>> build.
>>
>> or
>>
>> 2) Change the scheduler architecture so there is some "memory" of resources 
>> consumed between requests.  The scheduler would have to remember which hosts 
>> recent builds were assigned to.  This could be a bit of a data 
>> synchronization problem. if you're talking about using multiple scheduler 
>> instances.
> 
> Thanks for the feedback.  Neither of these sound too attractive to me.
> 
> I think Chris' comment to audit the usage of the fanout messages and get
> rid of them sounds like the best way forward to clean this up.


yeah, the fanout stuff was using the computes periodic_tasks, which was
too slow. Using them with existing state change notifications and a
specific notification driver would be a better approach.

Or, if you wanted to keep with something /like/ the db could we not use
memcache? (driver/optional)

> 

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Compute node stats sent to the scheduler

Reply via email to