Hi, I'm doing some data processing on pyspark, but I failed to reach JVM in workers. Here is what I did:
$ bin/pyspark >>> data = sc.parallelize(["123", "234"]) >>> numbers = data.map(lambda s: >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip())) >>> numbers.collect() I got, Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "<stdin>", line 1, in <lambda> AttributeError: 'NoneType' object has no attribute '_jvm' at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more While _jvm at the driver end looks fine: >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip()) 123 The program is trivial, I just wonder what is the right way to reach JVM in python. Any help would be appreciated. Thanks -- Yizhi Liu Senior Software Engineer / Data Mining www.mvad.com, Shanghai, China --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org