Have you read https://spark.apache.org/docs/latest/spark-standalone.html#high-availability ?
FYI On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > > Hi, > > Although Spark is fault tolerant when nodes go down like below: > > FROM tmp > [Stage 1:===========> (20 + 10) > / 100]16/08/11 20:21:34 ERROR TaskSchedulerImpl: Lost executor 3 on > xx.xxx.197.216: worker lost > [Stage 1:========================> (44 + 8) > / 100] > It can carry on. > > However, when the node (the host) that the app was started on goes down > the job fails as the driver disappears as well. Is there a way to avoid > this single point of failure, assuming what I am stating is valid? > > > Thanks > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >