Looking at Master.scala, I don't see code that would bring master back up automatically. Probably you can implement monitoring tool so that you get some alert when master goes down.
e.g. http://stackoverflow.com/questions/12896998/how-to-set-up-alerts-on-ganglia More experienced users may have better suggestion. On Thu, Jun 30, 2016 at 2:09 AM, vimal dinakaran <vimal3...@gmail.com> wrote: > Hi Ted, > Thanks for the pointers. I had a three node zookeeper setup . Now the > master alone dies when a zookeeper instance is down and a new master is > elected as leader and the cluster is up. > But the master that was down , never comes up. > > Is this the expected ? Is there a way to get alert when a master is down ? > How to make sure that there is atleast one back up master is up always ? > > Thanks > Vimal > > > > > On Tue, Jun 28, 2016 at 7:24 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please see some blog w.r.t. the number of nodes in the quorum: >> >> >> http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes >> >> http://www.ibm.com/developerworks/library/bd-zookeeper/ >> the paragraph starting with 'A quorum is represented by a strict >> majority of nodes' >> >> FYI >> >> On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <vimal3...@gmail.com> >> wrote: >> >>> I am using zookeeper for providing HA for spark cluster. We have two >>> nodes zookeeper cluster. >>> >>> When one of the zookeeper dies then the entire spark cluster goes down . >>> >>> Is this expected behaviour ? >>> Am I missing something in config ? >>> >>> Spark version - 1.6.1. >>> Zookeeper version - 3.4.6 >>> // spark-env.sh >>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER >>> -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181" >>> >>> Below is the log from spark master: >>> ZooKeeperLeaderElectionAgent: We have lost leadership >>> 16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master >>> shutting down. >>> >>> Thanks >>> Vimal >>> >>> >>> >>> >> >