On 8/2/2018 7:27 PM, Jay Pipes wrote:
It's not an exception. It's normal course of events. NoValidHosts means there were no compute nodes that met the requested resource amounts.

To clarify, I didn't mean a python exception. I concede that I should've chosen a better word for the type of object I have in mind.

If a SELECT statement against an Oracle DB returns 0 rows, is that an exception? No. Would an operator need to re-send the SELECT statement with an EXPLAIN SELECT in order to get information about what indexes were used to winnow the result set (to zero)? Yes. Either that, or the operator would need to gradually re-execute smaller SELECT statements containing fewer filters in order to determine which join or predicate caused a result set to contain zero rows.

I'm not sure if this analogy fully appreciates the perspective of the operator. You're correct of course that if you select on a db and the correct answer is zero rows, then zero rows is the right answer, 100% of the time.

Whereas what I thought we meant when we talk about "debugging no valid host failures" is that zero rows is *not* the right answer, and yet you're getting zero rows anyway. So yes, absolutely with an Oracle DB you would get an ORA-XXXXX exception in that case, along with a trace file that told you where things went off the rails. Which is exactly what we don't have here.

If I understand your perspective correctly, it's basically that placement is working as designed, so there's nothing more to do except pore over debug output. Can we consider:

 (1) that might not always be true if there are bugs

(2) even when it is technically true, from the user's perspective, I'd posit that it's rare that a user requests an instance with the express intent of not launching an instance. (?) If they're "debugging" this issue, it means there's a misconfiguration or some unexpected state that they have to go find. So it is exceptional in that sense, and either the operator or the user is going to need to know why the request failed in a large majority of these cases.

I would love to hear from any large operators on the list whether they feel that "turn on debug and try again" is really acceptable here. I'm not trying to be critical; I'm just convinced that once the cluster is of a certain size, that approach can start to become very expensive.

--
Michael Glasgow

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to