Re: [openstack-dev] [nova] How to debug no valid host failures with placement

Jay Pipes Thu, 02 Aug 2018 17:28:13 -0700

On 08/02/2018 06:18 PM, Michael Glasgow wrote:

On 08/02/18 15:04, Chris Friesen wrote:
On 08/02/2018 01:04 PM, melanie witt wrote:
The problem is an infamous one, which is, your users are trying to boot
instances and they get "No Valid Host" and an instance in ERRORstate. They contact support, and now support is trying to determinewhy NoValidHost happened. In the past, they would turn on DEBUG loglevel on the nova-scheduler, try another request, and take a look atthe scheduler logs.
At a previous Summit[1] there were some operators that said they justalways ran nova-scheduler with debug logging enabled in order to dealwith this issue, but that it was a pain [...]
I would go a bit further and say it's likely to be unacceptable on alarge cluster. It's expensive to deal with all those logs and tomanually comb through them for troubleshooting this issue type, whichcan happen frequently with some setups. Secondarily there areperformance and security concerns with leaving debug on all the time.
As to "defining the problem", I think it's what Melanie said. It'sabout asking for X and the system saying, "sorry, can't give you X" withno further detail or even means of discovering it.
More generally, any time a service fails to deliver a resource which itis primarily designed to deliver, it seems to me at this stage thatshould probably be taken a bit more seriously than just "check the logfile, maybe there's something in there?" From the user's perspective,if nova fails to produce an instance, or cinder fails to produce avolume, or neutron fails to build a subnet, that's kind of a big deal,right?
In such cases, would it be possible to generate a detailed exceptionobject which contains all the necessary info to ascertain why thatspecific failure occurred?

It's not an exception. It's normal course of events. NoValidHosts meansthere were no compute nodes that met the requested resource amounts.

There's plenty of ways the operator can get usage and trait informationand determine if there are providers that meet the requested amounts andrequired/forbidden traits.


What we're talking about here is debugging information, plain and simple.

If a SELECT statement against an Oracle DB returns 0 rows, is that anexception? No. Would an operator need to re-send the SELECT statementwith an EXPLAIN SELECT in order to get information about what indexeswere used to winnow the result set (to zero)? Yes. Either that, or theoperator would need to gradually re-execute smaller SELECT statementscontaining fewer filters in order to determine which join or predicatecaused a result set to contain zero rows.

That's exactly what we're talking about here. It's not an exception.It's debugging information.


Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

Reply via email to