On Fri, 18 Oct 2013 10:34:11 +0100 Steven Hardy <sha...@redhat.com> wrote: > IMO we don't want to go down the path of retry-loops in Heat, or scheduled > self-healing. We should just allow the user to trigger an stack update from > a failed state (CREATE_FAILED, or UPDATE_FAILED), and then they can define > their own policy on when recovery happens by triggering a stack update.
I think "retry" has two different implications in this topic. I'd like to organize "retry" means. ============================= 1) Stack Creation retry proposed here: https://blueprints.launchpad.net/heat/+spec/retry-failed-update - trigger: stack update to failed stack - function: replace failed resource and go ahead 2) API retry proposed here(Our blueprint): https://blueprints.launchpad.net/heat/+spec/support-retry-with-idempotency - trigger: can't get API response or get unexpected response code - function: retry API requests until it gets expected response code or it reaches a retry limit ============================= Our proposal is 2) After over the retry limit, Stack would change to XXX_FAILED status. I think it is same of currently heat behavior. We won't change mechanism of stack state transition. I understand proposal 1) aims to restart stack-processing of failed stack. These are different layer's subject, and both functionality will able to exist together. On Fri, 18 Oct 2013 10:34:11 +0100 Steven Hardy <sha...@redhat.com> wrote: > On Fri, Oct 18, 2013 at 12:13:45PM +1300, Steve Baker wrote: > > On 10/18/2013 01:54 AM, Mitsuru Kanabuchi wrote: > > > Hello Mr. Clint, > > > > > > Thank you for your comment and prioritization. > > > I'm glad to discuss you who feel same issue. > > > > > >> I took the liberty of targeting your blueprint at icehouse. If you don't > > >> think it is likely to get done in icehouse, please raise that with us at > > >> the weekly meeting if you can and we can remove it from the list. > > > Basically, this blueprint is targeted IceHouse release. > > > > > > However, the schedule is depend on follows blueprint: > > > https://blueprints.launchpad.net/nova/+spec/idempotentcy-client-token > > > > > > We're going to start implementation to Heat after ClientToken implemented. > > > I think ClientToken is necessary function for this blueprint, and > > > important function for other callers! > > Can there not be a default retry implementation which deletes any > > ERRORed resource and attempts the operation again? Then specific > > resources can switch to ClientToken as they become available. > > Yes, I think this is the way to go - have logic in every resources > handle_update (which would probably be common with check_create_complete), > which checks the status of the underlying physical resource, and if it's > not in the expected status, we replace it. > > This probably needs to be a new flag or API operation, as it clearly has > the possibility to be more destructive than a normal update (may delete > resources which have not changed in the template, but are in a bad state) > > > > On Wed, 16 Oct 2013 23:32:22 -0700 > > > Clint Byrum <cl...@fewbar.com> wrote: > > > > > >> Excerpts from Mitsuru Kanabuchi's message of 2013-10-16 04:47:08 -0700: > > >>> Hi all, > > >>> > > >>> We proposed a blueprint that supports API retry function with > > >>> idenpotency for Heat. > > >>> Prease review the blueprint. > > >>> > > >>> > > >>> https://blueprints.launchpad.net/heat/+spec/support-retry-with-idempotency > > >>> > > >> This looks great. It addresses some of what I've struggled with while > > >> thinking of how to handle the retry problem. > > >> > > >> I went ahead and linked bug #1160052 to the blueprint, as it is one that > > >> I've been trying to get a solution for. > > >> > > >> I took the liberty of targeting your blueprint at icehouse. If you don't > > >> think it is likely to get done in icehouse, please raise that with us at > > >> the weekly meeting if you can and we can remove it from the list. > > >> > > >> Note that there is another related blueprint here: > > >> > > >> https://blueprints.launchpad.net/heat/+spec/retry-failed-update > > >> > > >> > > > > Has any thought been given to where the policy should be specified for > > how many retries to attempt? > > > > Maybe sensible defaults should be defined in the python resources, and a > > new resource attribute can allow an override in the template on a > > per-resource basis (I'm referring to an attribute at the same level as > > Type, Properties, Metadata) > > IMO we don't want to go down the path of retry-loops in Heat, or scheduled > self-healing. We should just allow the user to trigger an stack update from > a failed state (CREATE_FAILED, or UPDATE_FAILED), and then they can define > their own policy on when recovery happens by triggering a stack update. > > This is basically what's described for discussion here: > http://summit.openstack.org/cfp/details/95 > > I personally think the scheduled self-healing is a bad idea, but the > convergence (as a special type of stack update) is a good one. > > For automatic recovery, we should instead be looking at triggering things > via Ceilometer alarms, so we can move towards removing all periodic task > stuff from Heat (because it doesn't scale, and it presents major issues > when scaling out) > > Steve > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -------------------- Mitsuru Kanabuchi NTT Software Corporation E-Mail : kanabuchi.mits...@po.ntts.co.jp _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev