On 06/17/2013 05:09 PM, Brian Elliott wrote: > > On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote: > >> >> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote: >> >>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote: >>>> Looking into the scheduler a bit there's an issue of duplicated effort >>>> that is a little puzzling. The database table `compute_nodes' is being >>>> updated periodically with data about capabilities and resources used >>>> (memory, vcpus, ...) while at the same time a periodic RPC call is being >>>> made to the scheduler sending pretty much the same data. >>>> >>>> Does anyone know why we are updating the same data in two different place >>>> using two different mechanisms? Also, assuming we were to remove one of >>>> these updates, which one should go? (I thought at one point in time there >>>> was a goal to create a database free compute node which would imply we >>>> should remove the DB update.) >>> >>> Have you looked around to see if any code is using the data from the db? >>> >>> Having schedulers hit the db for the current state of all compute nodes >>> all of the time would be a large additional db burden that I think we >>> should avoid. So, it makes sense to keep the rpc fanout_cast of current >>> stats to schedulers. >> >> This is actually what the scheduler uses. :) The fanout messages are too >> infrequent and can be too laggy. So, the scheduler was moved to using the >> DB a long, long time ago… but it was very inefficient, at first, because it >> looped through all instances. So we added things we needed into >> compute_node and compute_node_stats so we only had to look at the hosts. >> You have to pull the hosts anyway, so we pull the stats at the same time. >> >> The problem is… when we stopped using certain data from the fanout >> messages…. we never removed it. We should AT LEAST do this. But.. (see >> below).. >> >>> >>> The scheduler also does a fanout_cast to all compute nodes when it >>> starts up to trigger the compute nodes to populate the cache in the >>> scheduler. It would be nice to never fanout_cast to all compute nodes >>> (given that there may be a *lot* of them). We could replace this with >>> having the scheduler populate its cache from the database. >> >> I think we should audit the remaining things that the scheduler uses from >> these messages and move them to the DB. I believe it's limited to the >> hypervisor capabilities to compare against aggregates or some such. I >> believe it's things that change very rarely… so an alternative can be to >> only send fanout messages when capabilities change! We could always do >> that as a first step. >> >>> >>> Removing the db usage completely would be nice if nothing is actually >>> using it, but we'd have to look into an alternative solution for >>> removing the scheduler fanout_cast to compute. >> >> Relying on anything but the DB for current memory free, etc, is just too >> laggy… so we need to stick with it, IMO. >> >> - Chris >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > As Chris said, the reason it ended up this way using the DB is to quickly get > up to date usage on hosts to the scheduler. I certainly understand the point > that it's a whole lot of increased load on the DB, but the RPC data was quite > stale. If there is interest in moving away from the DB updates, I think we > have to either: > > 1) Send RPC updates to scheduler on essentially every state change during a > build. > > or > > 2) Change the scheduler architecture so there is some "memory" of resources > consumed between requests. The scheduler would have to remember which hosts > recent builds were assigned to. This could be a bit of a data > synchronization problem. if you're talking about using multiple scheduler > instances.
Thanks for the feedback. Neither of these sound too attractive to me. I think Chris' comment to audit the usage of the fanout messages and get rid of them sounds like the best way forward to clean this up. -- Russell Bryant _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev