These tips were very helpful! By setting SPARK_MASTER_IP as you suggest, I
was able to make progress. Unfortunately, it is unclear to me how to
specify the hadoop-client dependency for a pyspark stand-alone application.
So, I still get the EOFException, since I used a non-default Hadoop
distributio
I finally got it working. Main points:
- I had to add hadoop-client dependency to avoid a strange EOFException.
- I had to set SPARK_MASTER_IP in conf/start-master.sh to hostname -f
instead of hostname, since akka seems not to work properly with host names /
ip, it requires fully qualified domain