Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

Matt Riedemann Tue, 06 Feb 2018 16:48:07 -0800

On 2/6/2018 2:14 PM, Chris Apsey wrote:

but we would rather have intermittent build failures rather than computenodes falling over in the future.

Note that once a compute has a successful build, the consecutive buildfailures counter is reset. So if your limit is the default (10) and youhave 10 failures in a row, the compute service is auto-disabled. But ifyou have say 5 failures and then a pass, it's reset to 0 failures.

Obviously if you're doing a pack-first scheduling strategy rather thanspreading instances across the deployment, a burst of failures couldeasily disable a compute, especially if that host is overloaded like yousaw. I'm not sure if rescheduling is helping you or not - that would beuseful information since we consider the need to reschedule off a failedcompute host as a bad thing. At the Forum in Boston when this idea cameup, it was specifically for the case that operators in the room didn'twant a bad compute to become a "black hole" in their deployment causinglots of reschedules until they get that one fixed.


--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [nova] nova-compute automatically disabling itself?

Reply via email to