On 7 May 2015 at 22:52, Joshua Harlow <harlo...@outlook.com> wrote: > Hi all, > > In seeing the following: > > - https://review.openstack.org/#/c/169836/ > - https://review.openstack.org/#/c/163274/ > - https://review.openstack.org/#/c/138607/ > > Vilobh and I are starting to come to the conclusion that the service group > layers in nova really need to be cleaned up (without adding more features > that only work in one driver), or removed or other... Spec[0] has > interesting findings on this: > > A summary/highlights: > > * The zookeeper service driver in nova has probably been broken for 1 or > more releases, due to eventlet attributes that are gone that it via > evzookeeper[1] library was using. Evzookeeper only works for eventlet < > 0.17.1. Please refer to [0] for details. > * The memcache service driver really only uses memcache for a tiny piece of > the service liveness information (and does a database service table scan to > get the list of services). Please refer to [0] for details. > * Nova-manage service disable (CLI admin api) does interact with the service > group layer for the 'is_up'[3] API (but it also does a database service > table scan[4] to get the list of services, so this is inconsistent with the > service group driver API 'get_all'[2] view on what is enabled/disabled). > Please refer to [9][10] for nova manage service enable disable for details. > * Nova service delete (REST api) seems to follow a similar broken pattern > (it also avoids calling into the service group layer to delete a service, > which means it only works with the database layer[5], and therefore is > inconsistent with the service group 'get_all'[2] API). > > ^^ Doing the above makes both disable/delete agnostic about other backends > available that may/might manage service group data for example zookeeper, > memcache, redis etc... Please refer [6][7] for details. Ideally the API > should follow the model used in [8] so that the extension, admin interface > as well as the API interface use the same servicegroup interface which > should be *fully* responsible for managing services. Doing so we will have a > consistent view of services data, liveness, disabled/enabled and so-on... > > So with no disrespect to the authors of 169836 and 163274 (or anyone else > involved), I am wondering if we can put a request in to figure out how to > get the foundation of the service group concepts stabilized (or other...) > before adding more features (that only work with the DB layer). > > What is the path to request some kind of larger coordination effort by the > nova folks to fix the service group layers (and the concepts that are not > disjoint/don't work across them) before continuing to add features on-top of > a 'shakey' foundation? > > If I could propose something it would probably work out like the following: > > Step 0: Figure out if the service group API + layer(s) should be > maintained/tweaked at all (nova-core decides?) > > If maintain it: > > - Have an agreement that nova service extension, admin > interface(nova-manage) and API go through a common path for > update/delete/read. > * This common path should likely be the servicegroup API so as to have a > consistent view of data and that also helps nova to add different > data-stores (keeping the services data in a DB and getting numerous updates > about liveliness every few seconds of N number of compute where N is pretty > high can be detrimental to Nova's performance) > - At the same time allow 163274 to be worked on (since it fixes a edge-case > that was asked about in the initial addition of the delete API in its > initial code commit @ https://review.openstack.org/#/c/39998/) > - Delay 169836 until the above two/three are fixed (and stabilized); it's > down concept (and all other usages of services that are hitting a database > mentioned above) will need to go through the same service group foundation > that is currently being skipped. > > Else: > - Discard 138607 and start removing the service group code (and just use > the DB for all the things). > - Allow 163274 and 138607 (since those would be additions on-top of the DB > layer that will be preserved). > > Thoughts?
I wonder about this approach: * I think we need to go back and document what we want from the "service group" concept. * Then we look at the best approach to implement that concept. * Then look at the best way to get to a happy place from where we are now, ** Noting we will need "live" upgrade for (at least) the most widely used drivers Does that make any sense? Things that pop into my head, include: * The operators have been asking questions like: "Should new services not be "disabled" by default?" and "Can't my admins tell you that I just killed it?" * And from the scheduler point of view, how do we interact with the provider that tells us if something is alive or not? * From the RPC api point of view, do we want to send a cast to something that we know is dead, maybe we want to? Should we wait for calls to timeout, or give up quicker? * Polling the DB kinda sucks, although it sorta works for small deploys (and cells based deploys), being a separate DB to Nova would help some, should we force another external dependency for all users to deal with? Its hard enough to set things up already. Thanks, John > - Josh (and Vilobh, who is spending the most time on this recently) > > [0] Replace service group with tooz : > https://review.openstack.org/#/c/138607/ > [1] https://pypi.python.org/pypi/evzookeeper/ > [2] > https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L93 > [3] > https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L87 > [4] https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L711 > [5] > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L106 > [6] > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L107 > [7] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3436 > [8] > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L61 > [9] Nova manage enable : > https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L742 > [10] Nova manage disable : > https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L756 > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev