Re: [openstack-dev] [nova] How to debug no valid host failures with placement

Michael Glasgow Sat, 04 Aug 2018 16:36:57 -0700

On 8/2/2018 7:27 PM, Jay Pipes wrote:

It's not an exception. It's normal course of events. NoValidHosts meansthere were no compute nodes that met the requested resource amounts.

To clarify, I didn't mean a python exception. I concede that Ishould've chosen a better word for the type of object I have in mind.

If a SELECT statement against an Oracle DB returns 0 rows, is that anexception? No. Would an operator need to re-send the SELECT statementwith an EXPLAIN SELECT in order to get information about what indexeswere used to winnow the result set (to zero)? Yes. Either that, or theoperator would need to gradually re-execute smaller SELECT statementscontaining fewer filters in order to determine which join or predicatecaused a result set to contain zero rows.

I'm not sure if this analogy fully appreciates the perspective of theoperator. You're correct of course that if you select on a db and thecorrect answer is zero rows, then zero rows is the right answer, 100% ofthe time.

Whereas what I thought we meant when we talk about "debugging no validhost failures" is that zero rows is *not* the right answer, and yetyou're getting zero rows anyway. So yes, absolutely with an Oracle DByou would get an ORA-XXXXX exception in that case, along with a tracefile that told you where things went off the rails. Which is exactlywhat we don't have here.

If I understand your perspective correctly, it's basically thatplacement is working as designed, so there's nothing more to do exceptpore over debug output. Can we consider:


 (1) that might not always be true if there are bugs

(2) even when it is technically true, from the user's perspective, I'dposit that it's rare that a user requests an instance with the expressintent of not launching an instance. (?) If they're "debugging" thisissue, it means there's a misconfiguration or some unexpected state thatthey have to go find. So it is exceptional in that sense, and eitherthe operator or the user is going to need to know why the request failedin a large majority of these cases.

I would love to hear from any large operators on the list whether theyfeel that "turn on debug and try again" is really acceptable here. I'mnot trying to be critical; I'm just convinced that once the cluster isof a certain size, that approach can start to become very expensive.


--
Michael Glasgow

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

Reply via email to