On 24-Feb-16 14:26, Anant Patil wrote: > On 24-Feb-16 13:12, Clint Byrum wrote: >> Excerpts from Anant Patil's message of 2016-02-23 23:08:31 -0800: >>> Hi, >>> >>> I would like the discuss various approaches towards fixing bug >>> https://launchpad.net/bugs/1533176 >>> >>> When convergence is on, and if the stack is stuck, there is no way to >>> cancel the existing request. This feature was not implemented in >>> convergence, as the user can again issue an update on an in-progress >>> stack. But if a resource worker is stuck, the new update will wait >>> for-ever on it and the update will not be effective. >>> >>> The solution is to implement cancel request. Since the work for a stack >>> is distributed among heat engines, the cancel request will not work as >>> it does in legacy way. Many or all of the heat engines might be running >>> worker threads to provision a stack. >>> >>> I could think of two options which I would like to discuss: >>> >>> (a) When a user triggered cancel request is received, set the stack >>> current traversal to None or something else other than current >>> traversal. With this the new check-resources/workers will never be >>> triggered. This is okay as long as the worker(s) is not stuck. The >>> existing workers will finish running, and no new check-resource >>> (workers) will be triggered, and it will be a graceful cancel. But the >>> workers that are stuck will be stuck for-ever till stack times-out. To >>> take care of such cases, we will have to implement logic of "polling" >>> the DB at regular intervals (may be at each step() of scheduler task) >>> and bail out if the current traversal is updated. Basically, each worker >>> will "poll" the DB to see if the current traversal is still valid and if >>> not, stop itself. The drawback of this approach is that all the workers >>> will be hitting the DB and incur a significant overhead. Besides, all >>> the stack workers irrespective of whether they will be cancelled or not, >>> will keep on hitting DB. The advantage is that it probably is easier to >>> implement. Also, if the worker is stuck in particular "step", then this >>> approach will not work. >>> >> >> I think this is the simplest option. And if the polling gets to be too >> much, you can implement an observer pattern where one worker is just >> assigned to poll the traversal and if it changes, RPC to the known >> active workers that they should cancel any jobs using a now-cancelled >> stack version. >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > Hi Clint, > > I see that observer pattern is simple, but IMO it too is not efficient. > To implement it, we will have to note down in DB the worker to engine-id > relationship for all the workers, and then go through all of them and > send targeted cancel messages. This will also need us to have thread > group manager in each engine so that it can stop the thread group > running workers for the stack. > > Please help me understand if there is any particular disadvantage in > option (b) that I am not missing.
Sorry, I meant I am missing :) > > -- Anant > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev