Hello. You don't need to remove the /tmp/zeppelin_pyspark-4018989172273347075.py because it is generated automatically when you run the pyspark command. and I think you don't need to set PYTHONPATH if you have python in your system.
I recommend you are using the SPARK_HOME like following. *export SPARK_HOME=/home/clash/sparks/spark-1.6.1-bin-hadoop12* now you can restart zeppelin please run your python command. *. Could you give the absolute path for logFile like following. logFile = "/Users/user/hiv.data" 2017-02-01 11:48 GMT+09:00 mingda li <[email protected]>: > Dear all, > > We are using Zeppelin. And I have added the export > PYTHONPATH=/home/clash/sparks/spark-1.6.1-bin-hadoop12/python > to zeppelin-env.sh. > But each time, when I want to use pyspark, for example the program: > > %pyspark > from pyspark import SparkContext > logFile = "hiv.data" > logData = sc.textFile(logFile).cache() > numAs = logData.filter(lambda s: 'a' in s).count() > numBs = logData.filter(lambda s: 'b' in s).count() > print "Lines with a: %i, lines with b: %i" % (numAs, numBs) > > It can firstly run well. But second time, I run it again I will get such > error: > *Traceback (most recent call last):* > * File "/tmp/zeppelin_pyspark-4018989172273347075.py", line 238, in > <module>* > * sc.setJobGroup(jobGroup, "Zeppelin")* > * File > "/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/pyspark/context.py", > line 876, in setJobGroup* > * self._jsc.setJobGroup(groupId, description, interruptOnCancel)* > *AttributeError: 'NoneType' object has no attribute 'setJobGroup'* > > *I need to rm */tmp/zeppelin_pyspark-4018989172273347075.py and start > zeppelin again to let it work. > Does anyone have idea why? > > Thanks >
