Re: SparkContext with error from PySpark

Josh Rosen Tue, 30 Dec 2014 12:23:05 -0800

To configure the Python executable used by PySpark, see the "Using the
Shell" Python section in the Spark Programming Guide:
https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell


You can set the PYSPARK_PYTHON environment variable to choose the Python
executable that will be used on the driver and executors.  In addition, you
can set PYSPARK_DRIVER_PYTHON to use a different Python executable only on
the driver (this is useful if you want to use IPython on the driver but not
on the executors).

On Tue, Dec 30, 2014 at 11:13 AM, JAGANADH G <jagana...@gmail.com> wrote:

> Hi
>
> I am using Aanonda Python. Is there any way to specify the Python which we
> have o use for running pyspark in a cluster.
>
> Best regards
>
> Jagan
>
> On Tue, Dec 30, 2014 at 6:27 PM, Eric Friedman <eric.d.fried...@gmail.com>
> wrote:
>
>> The Python installed in your cluster is 2.5. You need at least 2.6.
>>
>> ----
>> Eric Friedman
>>
>> > On Dec 30, 2014, at 7:45 AM, Jaggu <jagana...@gmail.com> wrote:
>> >
>> > Hi Team,
>> >
>> > I was trying to execute a Pyspark code in cluster. It gives me the
>> following
>> > error. (Wne I run the same job in local it is working fine too :-()
>> >
>> > Eoor
>> >
>> > Error from python worker:
>> >  /usr/lib/spark-1.2.0-bin-hadoop2.3/python/pyspark/context.py:209:
>> Warning:
>> > 'with' will become a reserved keyword in Python 2.6
>> >  Traceback (most recent call last):
>> >    File
>> >
>> "/home/beehive/toolchain/x86_64-unknown-linux-gnu/python-2.5.2/lib/python2.5/runpy.py",
>> > line 85, in run_module
>> >      loader = get_loader(mod_name)
>> >    File
>> >
>> "/home/beehive/toolchain/x86_64-unknown-linux-gnu/python-2.5.2/lib/python2.5/pkgutil.py",
>> > line 456, in get_loader
>> >      return find_loader(fullname)
>> >    File
>> >
>> "/home/beehive/toolchain/x86_64-unknown-linux-gnu/python-2.5.2/lib/python2.5/pkgutil.py",
>> > line 466, in find_loader
>> >      for importer in iter_importers(fullname):
>> >    File
>> >
>> "/home/beehive/toolchain/x86_64-unknown-linux-gnu/python-2.5.2/lib/python2.5/pkgutil.py",
>> > line 422, in iter_importers
>> >      __import__(pkg)
>> >    File "/usr/lib/spark-1.2.0-bin-hadoop2.3/python/pyspark/__init__.py",
>> > line 41, in <module>
>> >      from pyspark.context import SparkContext
>> >    File "/usr/lib/spark-1.2.0-bin-hadoop2.3/python/pyspark/context.py",
>> > line 209
>> >      with SparkContext._lock:
>> >                      ^
>> >  SyntaxError: invalid syntax
>> > PYTHONPATH was:
>> >
>> >
>> /usr/lib/spark-1.2.0-bin-hadoop2.3/python:/usr/lib/spark-1.2.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark-1.2.0-bin-hadoop2.3/lib/spark-assembly-1.2.0-hadoop2.3.0.jar:/usr/lib/spark-1.2.0-bin-hadoop2.3/sbin/../python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark-1.2.0-bin-hadoop2.3/sbin/../python:/home/beehive/bin/utils/primitives:/home/beehive/bin/utils/pylogger:/home/beehive/bin/utils/asterScript:/home/beehive/bin/lib:/home/beehive/bin/utils/init:/home/beehive/installer/packages:/home/beehive/ncli
>> > java.io.EOFException
>> >        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> >        at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
>> >        at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
>> >        at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
>> >        at
>> org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:102)
>> >        at
>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
>> >        at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> >        at
>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> >        at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> >        at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> >        at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >        at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >        at java.lang.Thread.run(Thread.java:722)
>> >
>> > 14/12/31 04:49:58 INFO TaskSetManager: Starting task 0.1 in stage 0.0
>> (TID
>> > 1, aster4, NODE_LOCAL, 1321 bytes)
>> > 14/12/31 04:49:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>> memory
>> > on aster4:43309 (size: 3.8 KB, free: 265.0 MB)
>> > 14/12/31 04:49:59 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID
>> 1) on
>> > executor aster4: org.apache.spark.SparkException (
>> >
>> >
>> > Any clue how to resolve the same.
>> >
>> > Best regards
>> >
>> > Jagan
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-with-error-from-PySpark-tp20907.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>
>
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>

Re: SparkContext with error from PySpark

Reply via email to