On Jun 17, 2013, at 3:50 PM, Chris Behrens <cbehr...@codestud.com> wrote:
> > On Jun 17, 2013, at 7:49 AM, Russell Bryant <rbry...@redhat.com> wrote: > >> On 06/16/2013 11:25 PM, Dugger, Donald D wrote: >>> Looking into the scheduler a bit there's an issue of duplicated effort that >>> is a little puzzling. The database table `compute_nodes' is being updated >>> periodically with data about capabilities and resources used (memory, >>> vcpus, ...) while at the same time a periodic RPC call is being made to the >>> scheduler sending pretty much the same data. >>> >>> Does anyone know why we are updating the same data in two different place >>> using two different mechanisms? Also, assuming we were to remove one of >>> these updates, which one should go? (I thought at one point in time there >>> was a goal to create a database free compute node which would imply we >>> should remove the DB update.) >> >> Have you looked around to see if any code is using the data from the db? >> >> Having schedulers hit the db for the current state of all compute nodes >> all of the time would be a large additional db burden that I think we >> should avoid. So, it makes sense to keep the rpc fanout_cast of current >> stats to schedulers. > > This is actually what the scheduler uses. :) The fanout messages are too > infrequent and can be too laggy. So, the scheduler was moved to using the DB > a long, long time ago… but it was very inefficient, at first, because it > looped through all instances. So we added things we needed into compute_node > and compute_node_stats so we only had to look at the hosts. You have to pull > the hosts anyway, so we pull the stats at the same time. > > The problem is… when we stopped using certain data from the fanout messages…. > we never removed it. We should AT LEAST do this. But.. (see below).. > >> >> The scheduler also does a fanout_cast to all compute nodes when it >> starts up to trigger the compute nodes to populate the cache in the >> scheduler. It would be nice to never fanout_cast to all compute nodes >> (given that there may be a *lot* of them). We could replace this with >> having the scheduler populate its cache from the database. > > I think we should audit the remaining things that the scheduler uses from > these messages and move them to the DB. I believe it's limited to the > hypervisor capabilities to compare against aggregates or some such. I > believe it's things that change very rarely… so an alternative can be to only > send fanout messages when capabilities change! We could always do that as a > first step. > >> >> Removing the db usage completely would be nice if nothing is actually >> using it, but we'd have to look into an alternative solution for >> removing the scheduler fanout_cast to compute. > > Relying on anything but the DB for current memory free, etc, is just too > laggy… so we need to stick with it, IMO. > > - Chris > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev As Chris said, the reason it ended up this way using the DB is to quickly get up to date usage on hosts to the scheduler. I certainly understand the point that it's a whole lot of increased load on the DB, but the RPC data was quite stale. If there is interest in moving away from the DB updates, I think we have to either: 1) Send RPC updates to scheduler on essentially every state change during a build. or 2) Change the scheduler architecture so there is some "memory" of resources consumed between requests. The scheduler would have to remember which hosts recent builds were assigned to. This could be a bit of a data synchronization problem. if you're talking about using multiple scheduler instances. Brian _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev