> Whoah, but that's after 10 tries (by default). And if e.g. it bounced > because the instance is too big for the host, but other, smaller > instances come in and succeed in the meantime, that could wind up being > stretched indefinitely. Doesn't sound like a complete answer to this issue.
No dude, remember, this is all assuming that claiming with placement eliminates 100% of the resource races :) The _only_ things left to reschedule for are (a) straight up 100% fail compute host misconfigurations and (b) anything that fails some percentage of the time and will actually be resolved by trying a different host (i.e. baseline 40% ironic ipmi failbots). > Today you can limit the set of compute hosts to try by specifying an > "availability zone". Perhaps the answer here is to support some kind of > "exclude these hosts" list to a "fresh" deploy. > > But is the cure worse than the disease? I (and I think others) would argue that the user needing to know that they should try a different AZ is not reasonable UX. A rebuild of an instance that failed to boot can/should exclude the original host on the rebuild attempt. It does today with reschedules so it's not that hard, just requires some plumbing. --Dan _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators