Hi,

I'm doing some data processing on pyspark, but I failed to reach JVM
in workers. Here is what I did:

$ bin/pyspark
>>> data = sc.parallelize(["123", "234"])
>>> numbers = data.map(lambda s: 
>>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip()))
>>> numbers.collect()

I got,

Caused by: org.apache.spark.api.python.PythonException: Traceback
(most recent call last):
  File 
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
    process()
  File 
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File 
"/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute '_jvm'

at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more

While _jvm at the driver end looks fine:

>>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip())
123

The program is trivial, I just wonder what is the right way to reach
JVM in python. Any help would be appreciated.

Thanks

-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to