Re: [openstack-dev] Nova scheduler startup when database is not available

Morgan Fainberg Wed, 23 Dec 2015 17:39:49 -0800

On Wed, Dec 23, 2015 at 10:32 AM, Jay Pipes <jaypi...@gmail.com> wrote:


> On 12/23/2015 12:27 PM, Lars Kellogg-Stedman wrote:
>
>> I've been looking into the startup constraints involved when launching
>> Nova services with systemd using Type=notify (which causes systemd to
>> wait for an explicit notification from the service before considering
>> it to be "started".  Some services (e.g., nova-conductor) will happily
>> "start" even if the backing database is currently unavailable (and
>> will enter a retry loop waiting for the database).
>>
>> Other services -- specifically, nova-scheduler -- will block waiting
>> for the database *before* providing systemd with the necessary
>> notification.
>>
>> nova-scheduler blocks because it wants to initialize a list of
>> available aggregates (in scheduler.host_manager.HostManager.__init__),
>> which it gets by calling objects.AggregateList.get_all.
>>
>> Does it make sense to block service startup at this stage?  The
>> database disappearing during runtime isn't a hard error -- we will
>> retry and reconnect when it comes back -- so should the same situation
>> at startup be a hard error?  As an operator, I am more interested in
>> "did my configuration files parse correctly?" at startup, and would
>> generally prefer the service to start (and permit any dependent
>> services to start) even when the database isn't up (because that's
>> probably a situation of which I am already aware).
>>
>
> If your configuration file parsed correctly but has the wrong database
> connection URI, what good is the service in an active state? It won't be
> able to do anything at all.
>
> This is why I think it's better to have hard checks like for connections
> on startup and not have services active if they won't be able to do
> anything useful.
>
>
Are you advocating that scheduler bails out and ceases to run or that it
doesn't mark itself as active? I am in favour of the second scenario but
not the first. There are cases where it would be nice to start the
scheduler and have it at least report "hey I can't contact the DB" but not
mark itself active, but continue to run and on <interval> report/try to
reconnect.

It isn't clear which level of "hard check" you're advocating in your
response and I want to clarify for the sake of conversation.


> It would be relatively easy to have the scheduler lazy-load the list
>> of aggregates on first references, rather than at __init__.
>>
>
> Sure, but if the root cause of the issue is a problem due to misconfigured
> connection string, then that lazy-load will just bomb out and the scheduler
> will be useless anyway. I'd rather have a fail-early/fast occur here than a
> fail-late.
>
> Best,
> -jay
>
> > I'm not
>
>> familiar enough with the nova code to know if there would be any
>> undesirable implications of this behavior.  We're already punting
>> initializing the list of instances to an asynchronous task in order to
>> avoid blocking service startup.
>>
>> Does it make sense to permit nova-scheduler to complete service
>> startup in the absence of the database (and then retry the connection
>> in the background)?
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Nova scheduler startup when database is not available

Reply via email to