I am trying to set up a Spark cluster with multi-master HA.  I have 3 spark
nodes connecting to a single zookeeper node running on a separate server.
When running in this configuration, Over the course of 1-2 hours each node
ends its session because it is not receving any messages from the server.
The standby nodes reconnect, but if a leader encounters it, it immediately
exits.

The net result is that the cluster slowly dies as each master ends its
session and terminates.

The spark cluster is not in use so I don't think this is a GC issue.
Pings, etc seem reliable.  I have tried adjusting timeouts but that doesn't
work either.

Any ideas how to resolve this?

Thanks!

-- 
Mark Bidewell
http://www.linkedin.com/in/markbidewell

Reply via email to