Hi Anant, For the second option, if the leader engine fails, how to trigger a new leader election progress? Best Regards,Yingzhe Zeng
> To: openstack-dev@lists.openstack.org > From: anant.pa...@hpe.com > Date: Wed, 30 Sep 2015 12:40:52 +0530 > Subject: [openstack-dev] [heat] Convergence: Detecting and handling worker > failures > > Hi, > > One of remaining items in convergence is detecting and handling engine > (the engine worker) failures, and here are my thoughts. > > Background: Since the work is distributed among heat engines, by some > means heat needs to detect the failure and pick up the tasks from failed > engine and re-distribute or run the task again. > > One of the simple way is to poll the DB to detect the liveliness by > checking the table populated by heat-manage. Each engine records its > presence periodically by updating current timestamp. All the engines > will have a periodic task for checking the DB for liveliness of other > engines. Each engine will check for timestamp updated by other engines > and if it finds one which is older than the periodicity of timestamp > updates, then it detects a failure. When this happens, the remaining > engines, as and when they detect the failures, will try to acquire the > lock for in-progress resources that were handled by the engine which > died. They will then run the tasks to completion. > > Another option is to use a coordination library like the community owned > tooz (http://docs.openstack.org/developer/tooz/) which supports > distributed locking and leader election. We use it to elect a leader > among heat engines and that will be responsible for running periodic > tasks for checking state of each engine and distributing the tasks to > other engines when one fails. The advantage, IMHO, will be simplified > heat code. Also, we can move the timeout task to the leader which will > run time out for all the stacks and sends signal for aborting operation > when timeout happens. The downside: an external resource like > Zookeper/memcached etc are needed for leader election. > > In the long run, IMO, using a library like tooz will be useful for heat. > A lot of boiler plate needed for locking and running centralized tasks > (such as timeout) will not be needed in heat. Given that we are moving > towards distribution of tasks and horizontal scaling is preferred, it > will be advantageous to use them. > > Please share your thoughts. > > - Anant > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev