Hi Ted, Thank you for reply. The sc works at driver, but how can I reach the JVM in rdd.map ?
2015-09-29 11:26 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: >>>> sc._jvm.java.lang.Integer.valueOf("12") > 12 > > FYI > > On Mon, Sep 28, 2015 at 8:08 PM, YiZhi Liu <javeli...@gmail.com> wrote: >> >> Hi, >> >> I'm doing some data processing on pyspark, but I failed to reach JVM >> in workers. Here is what I did: >> >> $ bin/pyspark >> >>> data = sc.parallelize(["123", "234"]) >> >>> numbers = data.map(lambda s: >> >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip())) >> >>> numbers.collect() >> >> I got, >> >> Caused by: org.apache.spark.api.python.PythonException: Traceback >> (most recent call last): >> File >> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", >> line 111, in main >> process() >> File >> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", >> line 106, in process >> serializer.dump_stream(func(split_index, iterator), outfile) >> File >> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py", >> line 263, in dump_stream >> vs = list(itertools.islice(iterator, batch)) >> File "<stdin>", line 1, in <lambda> >> AttributeError: 'NoneType' object has no attribute '_jvm' >> >> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) >> at >> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> ... 1 more >> >> While _jvm at the driver end looks fine: >> >> >>> >> >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip()) >> 123 >> >> The program is trivial, I just wonder what is the right way to reach >> JVM in python. Any help would be appreciated. >> >> Thanks >> >> -- >> Yizhi Liu >> Senior Software Engineer / Data Mining >> www.mvad.com, Shanghai, China >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > -- Yizhi Liu Senior Software Engineer / Data Mining www.mvad.com, Shanghai, China --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org