Hi Ted,

Thank you for reply. The sc works at driver, but how can I reach the
JVM in rdd.map ?

2015-09-29 11:26 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>>>> sc._jvm.java.lang.Integer.valueOf("12")
> 12
>
> FYI
>
> On Mon, Sep 28, 2015 at 8:08 PM, YiZhi Liu <javeli...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm doing some data processing on pyspark, but I failed to reach JVM
>> in workers. Here is what I did:
>>
>> $ bin/pyspark
>> >>> data = sc.parallelize(["123", "234"])
>> >>> numbers = data.map(lambda s:
>> >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip()))
>> >>> numbers.collect()
>>
>> I got,
>>
>> Caused by: org.apache.spark.api.python.PythonException: Traceback
>> (most recent call last):
>>   File
>> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
>> line 111, in main
>>     process()
>>   File
>> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
>> line 106, in process
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>> line 263, in dump_stream
>>     vs = list(itertools.islice(iterator, batch))
>>   File "<stdin>", line 1, in <lambda>
>> AttributeError: 'NoneType' object has no attribute '_jvm'
>>
>> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
>> at
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ... 1 more
>>
>> While _jvm at the driver end looks fine:
>>
>> >>>
>> >>> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip())
>> 123
>>
>> The program is trivial, I just wonder what is the right way to reach
>> JVM in python. Any help would be appreciated.
>>
>> Thanks
>>
>> --
>> Yizhi Liu
>> Senior Software Engineer / Data Mining
>> www.mvad.com, Shanghai, China
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to