Looking at Master.scala, I don't see code that would bring master back up
automatically.
Probably you can implement monitoring tool so that you get some alert when
master goes down.

e.g.
http://stackoverflow.com/questions/12896998/how-to-set-up-alerts-on-ganglia

More experienced users may have better suggestion.

On Thu, Jun 30, 2016 at 2:09 AM, vimal dinakaran <vimal3...@gmail.com>
wrote:

> Hi Ted,
>  Thanks for the pointers. I had a three node zookeeper setup . Now the
> master alone dies when  a zookeeper instance is down and a new master is
> elected as leader and the cluster is up.
> But the master that was down , never comes up.
>
> Is this the expected ? Is there a way to get alert when a master is down ?
> How to make sure that there is atleast one back up master is up always ?
>
> Thanks
> Vimal
>
>
>
>
> On Tue, Jun 28, 2016 at 7:24 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Please see some blog w.r.t. the number of nodes in the quorum:
>>
>>
>> http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes
>>
>> http://www.ibm.com/developerworks/library/bd-zookeeper/
>>   the paragraph starting with 'A quorum is represented by a strict
>> majority of nodes'
>>
>> FYI
>>
>> On Tue, Jun 28, 2016 at 5:52 AM, vimal dinakaran <vimal3...@gmail.com>
>> wrote:
>>
>>> I am using zookeeper for providing HA for spark cluster.  We have two
>>> nodes zookeeper cluster.
>>>
>>> When one of the zookeeper dies then the entire spark cluster goes down .
>>>
>>> Is this expected behaviour ?
>>> Am I missing something in config ?
>>>
>>> Spark version - 1.6.1.
>>> Zookeeper version - 3.4.6
>>> // spark-env.sh
>>> SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
>>> -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
>>>
>>> Below is the log from spark master:
>>> ZooKeeperLeaderElectionAgent: We have lost leadership
>>> 16/06/27 09:39:30 ERROR Master: Leadership has been revoked -- master
>>> shutting down.
>>>
>>> Thanks
>>> Vimal
>>>
>>>
>>>
>>>
>>
>

Reply via email to