On 06/19/2013 05:54 PM, Russell Bryant wrote: > On 06/17/2013 05:09 PM, Brian Elliott wrote: >> >> On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote: >> >>> >>> On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote: >>> >>>> On 06/16/2013 11:25 PM, Dugger, Donald D wrote: >>>>> Looking into the scheduler a bit there's an issue of duplicated effort >>>>> that is a little puzzling. The database table `compute_nodes' is being >>>>> updated periodically with data about capabilities and resources used >>>>> (memory, vcpus, ...) while at the same time a periodic RPC call is being >>>>> made to the scheduler sending pretty much the same data. >>>>> >>>>> Does anyone know why we are updating the same data in two different place >>>>> using two different mechanisms? Also, assuming we were to remove one of >>>>> these updates, which one should go? (I thought at one point in time >>>>> there was a goal to create a database free compute node which would imply >>>>> we should remove the DB update.) >>>> >>>> Have you looked around to see if any code is using the data from the db? >>>> >>>> Having schedulers hit the db for the current state of all compute nodes >>>> all of the time would be a large additional db burden that I think we >>>> should avoid. So, it makes sense to keep the rpc fanout_cast of current >>>> stats to schedulers. >>> >>> This is actually what the scheduler uses. :) The fanout messages are too >>> infrequent and can be too laggy. So, the scheduler was moved to using the >>> DB a long, long time ago… but it was very inefficient, at first, because it >>> looped through all instances. So we added things we needed into >>> compute_node and compute_node_stats so we only had to look at the hosts. >>> You have to pull the hosts anyway, so we pull the stats at the same time. >>> >>> The problem is… when we stopped using certain data from the fanout >>> messages…. we never removed it. We should AT LEAST do this. But.. (see >>> below).. >>> >>>> >>>> The scheduler also does a fanout_cast to all compute nodes when it >>>> starts up to trigger the compute nodes to populate the cache in the >>>> scheduler. It would be nice to never fanout_cast to all compute nodes >>>> (given that there may be a *lot* of them). We could replace this with >>>> having the scheduler populate its cache from the database. >>> >>> I think we should audit the remaining things that the scheduler uses from >>> these messages and move them to the DB. I believe it's limited to the >>> hypervisor capabilities to compare against aggregates or some such. I >>> believe it's things that change very rarely… so an alternative can be to >>> only send fanout messages when capabilities change! We could always do >>> that as a first step. >>> >>>> >>>> Removing the db usage completely would be nice if nothing is actually >>>> using it, but we'd have to look into an alternative solution for >>>> removing the scheduler fanout_cast to compute. >>> >>> Relying on anything but the DB for current memory free, etc, is just too >>> laggy… so we need to stick with it, IMO. >>> >>> - Chris >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> As Chris said, the reason it ended up this way using the DB is to quickly >> get up to date usage on hosts to the scheduler. I certainly understand the >> point that it's a whole lot of increased load on the DB, but the RPC data >> was quite stale. If there is interest in moving away from the DB updates, I >> think we have to either: >> >> 1) Send RPC updates to scheduler on essentially every state change during a >> build. >> >> or >> >> 2) Change the scheduler architecture so there is some "memory" of resources >> consumed between requests. The scheduler would have to remember which hosts >> recent builds were assigned to. This could be a bit of a data >> synchronization problem. if you're talking about using multiple scheduler >> instances. > > Thanks for the feedback. Neither of these sound too attractive to me. > > I think Chris' comment to audit the usage of the fanout messages and get > rid of them sounds like the best way forward to clean this up.
yeah, the fanout stuff was using the computes periodic_tasks, which was too slow. Using them with existing state change notifications and a specific notification driver would be a better approach. Or, if you wanted to keep with something /like/ the db could we not use memcache? (driver/optional) > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev