This should be the case that you run different versions for Python in
driver and slaves, Spark 1.4 will double check that  will release
soon).

SPARK_PYTHON should be PYSPARK_PYTHON

On Tue, May 26, 2015 at 11:21 AM, Nikhil Muralidhar <nmural...@gmail.com> wrote:
> Hello,
>   I am trying to run a spark job (which runs fine on the master node of the
> cluster), on a HDFS hadoop cluster using YARN. When I run the job which has
> a rdd.saveAsTextFile() line in it, I get the following error:
>
> SystemError: unknown opcode
>
> The entire stacktrace has been appended to this message.
>
>  All the nodes on the cluster have Python 2.7.9 running on them including
> the master and all of them have the variable SPARK_PYTHON set to the
> anaconda python path. When I try pyspark-shell on these instances they use
> anaconda python to open up the spark shell.
>
> I installed anaconda on all slaves after looking at the python version
> incompatibility issues mentioned in the following post:
>
>
> http://glennklockwood.blogspot.com/2014/06/spark-on-supercomputers-few-notes.html
>
> Please let me know what the issue might be.
>
> The spark version we are using is Spark 1.3
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to