I would think we have enough tracking information to support the goal of identifying failures. In any scenario, some of the failures will simply be unrecoverable.
Regarding the process crashing, who's to say the retry process also wouldn't crash? We could endlessly argue the arbiter/watchdog processes will crash at each tier. As such, I think it's better to say we need a simpler mechanism for identifying failures and perhaps a best-effort retry. Retrying can be scary, to say the least. You can't possibly handle all of the possible failure scenarios, and some of the ones you think you can might be different in subtle ways such that retrying them only causes more issues. I agree with Lamar that we could make things significantly more reliable, and I think that's where we should start. We may find that, after some stabilization work, the failure rate is acceptably low and any retry mechanism is no longer required. On 8/29/11 11:24 AM, "Kevin L. Mitchell" <kevin.mitch...@rackspace.com> wrote: >On Fri, 2011-08-26 at 23:10 +0000, Monsyne Dragon wrote: >> First off, I think it would be better if whatever had the failure >> responded by sending a request somewhere (a cast) to say "Hey, this >> bombed. Retry it. " > >What if the failure was due to the process crashing, so that it can't >possibly send a request/cast off for retry? >-- >Kevin L. Mitchell <kevin.mitch...@rackspace.com> > >This email may include confidential information. If you received it in >error, please delete it. >_______________________________________________ >Mailing list: https://launchpad.net/~openstack >Post to : openstack@lists.launchpad.net >Unsubscribe : https://launchpad.net/~openstack >More help : https://help.launchpad.net/ListHelp This email may include confidential information. If you received it in error, please delete it. _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp