I have also had trouble in worker joining the working set. I have typically moved to Mesos based setup. Frankly for high availability you are better off using a cluster manager.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Jun 13, 2014 at 8:57 AM, Yana Kadiyska <[email protected]> wrote: > Hi, I see this has been asked before but has not gotten any satisfactory > answer so I'll try again: > > (here is the original thread I found: > http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%[email protected]%3E > ) > > I have a set of workers dying and coming back again. The master prints the > following warning: > > "Got heartbeat from unregistered worker ...." > > What is the solution to this -- rolling the master is very undesirable to > me as I have a Shark context sitting on top of it (it's meant to be highly > available). > > Insights appreciated -- I don't think an executor going down is very > unexpected but it does seem odd that it won't be able to rejoin the working > set. > > I'm running Spark 0.9.1 on CDH > > >
