I believe I solved my problem. The worker-node didn't know where to return the answers. I set SPARK_LOCAL_IP and the program runs as it should.
On Mon, Feb 24, 2014 at 3:55 PM, Anders Bennehag <and...@tajitsu.com> wrote: > Hello there, > > I'm having some trouble with my spark-cluster consisting of > > master.censored.dev and > spark-worker-0 > > Reading from the output of pyspark, master, and worker-node it seems like > the cluster is formed correctly and pyspark connects to it. But for some > reason, nothing happens after "TaskSchedulerImpl: Adding task set". Why is > this and how can I investigate it further? > > I haven't really seen any clues in the web-ui. > > The program output is as follows: > pypark: > https://gist.githubusercontent.com/PureW/ebe1b95b9b4814fc2533/raw/e2d08b7b6288afad3cb03238acc3d172291166d8/pyspark+log > master: > https://gist.githubusercontent.com/PureW/9889bc9b57a8406599df/raw/4b1faeda8bacff06b5c3a32d75e74ef114933504/Spark-master > worker: > https://gist.githubusercontent.com/PureW/7451cd5ed6780f4d1e33/raw/f45971bd1e6cba620db566998a9afd035ea8d529/spark-worker > > The code I am running through pyspark can be seen at > https://gist.github.com/PureW/2c9603bdf1ef2ae772f3 > When the worker-node couldn't access the data, it raised an exception, but > now there's nothing at all. I've run the code locally and it only takes > ~15s to finish. > > > Thanks for any help! > /Anders >