Re: Error about PySpark

Hyung Sung Shim Wed, 01 Feb 2017 01:34:01 -0800

Hello.
You don't need to remove the /tmp/zeppelin_pyspark-4018989172273347075.py
because it is generated automatically when you run the pyspark command.
and I think you don't need to set PYTHONPATH if you have python in your
system.


I recommend you are using the SPARK_HOME like following.
*export SPARK_HOME=/home/clash/sparks/spark-1.6.1-bin-hadoop12*

now you can restart zeppelin please run your python command.

*. Could you give the absolute path for logFile like following.
logFile = "/Users/user/hiv.data"


2017-02-01 11:48 GMT+09:00 mingda li <[email protected]>:

> Dear all,
>
> We are using Zeppelin. And I have added the export
> PYTHONPATH=/home/clash/sparks/spark-1.6.1-bin-hadoop12/python
> to zeppelin-env.sh.
> But each time, when I want to use pyspark, for example the program:
>
> %pyspark
> from pyspark import SparkContext
> logFile = "hiv.data"
> logData = sc.textFile(logFile).cache()
> numAs = logData.filter(lambda s: 'a' in s).count()
> numBs = logData.filter(lambda s: 'b' in s).count()
> print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
>
> It can firstly run well. But second time, I run it again I will get such
> error:
> *Traceback (most recent call last):*
> *  File "/tmp/zeppelin_pyspark-4018989172273347075.py", line 238, in
> <module>*
> *    sc.setJobGroup(jobGroup, "Zeppelin")*
> *  File
> "/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/pyspark/context.py",
> line 876, in setJobGroup*
> *    self._jsc.setJobGroup(groupId, description, interruptOnCancel)*
> *AttributeError: 'NoneType' object has no attribute 'setJobGroup'*
>
> *I need to rm */tmp/zeppelin_pyspark-4018989172273347075.py and start
> zeppelin again to let it work.
> Does anyone have idea why?
>
> Thanks
>

Re: Error about PySpark

Reply via email to