On Tue, Sep 5, 2017 at 1:03 PM Raghurama Bhat <rb...@proofpoint.com> wrote:
> Hi Rick, > > > > My question was more generic in the sense, How do I build self healing to > the system where I can replace broken machines. In the scenario that you > describe below, will the remaining workers be correctly associated with the > new master and etcd? Will it correctly trigger the charm logic on worker > machine to point it to the new etcd and master? > So in the scenario the new machine will come up and the charms will be told about the new units available to be put into the pool to be used. However, how the pool of available running instances of the software is managed, that's managed by the software itself and how it recovers from a fallen comrade. In the case of a datastore like etcd, it's up to etcd to manage how many instances are in the cluster and how it responds when one is removed and another is added. Juju doesn't do anything special with the actual process itself there. It makes sure that all the information about each running unit, the configuration, and it's the details of each unit of related applications is up to date. To make sure things are resilient you'd want to wire up monitoring to either scripts (potentially written with libjuju) to respond how you'd like to respond to different types of failure. This might mean adding new units, it might mean scaling up units by adding new units with increased constraints, it might mean performing actions against charms, it might involve sending out automated messages to folks on call. That's something that is very business specific and hopefully users have the building blocks and tools they need to solve their business problems. Rick >
-- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju