Re: Python script calls R script in Zeppelin on Hadoop

Lian Jiang Tue, 28 Aug 2018 22:08:14 -0700

Thanks Jeff.

This worked:


%livy2.pyspark
from pyspark import SparkFiles
import subprocess

sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
testpath = SparkFiles.get('test.r')
stdoutdata = subprocess.getoutput("Rscript " + testpath)
print(stdoutdata)

Cheers!

On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote:

> Do you run it under yarn-cluster mode ? Then you must ensure your rscript
> shipped to that driver (via sc.addFile or setting livy.spark.files)
>
> And also you need to make sure you have R installed in all hosts of yarn
> cluster because the driver may run any node of this cluster.
>
>
>
> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道：
>
>> Thanks Lucas. We tried and got the same error. Below is the code:
>>
>> %livy2.pyspark
>> import subprocess
>> sc.addFile("hdfs:///user/zeppelin/test.r")
>> stdoutdata = subprocess.getoutput("Rscript test.r")
>> print(stdoutdata)
>>
>> Fatal error: cannot open file 'test.r': No such file or directory
>>
>>
>> sc.addFile adds test.r to spark context. However, subprocess does not use
>> spark context.
>>
>> Hdfs path does not work either: subprocess.getoutput("Rscript
>> hdfs:///user/zeppelin/test.r")
>>
>> Any idea how to make python call R script? Appreciate!
>>
>>
>>
>>
>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>> lucas.partri...@ge.com> wrote:
>>
>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>> script?
>>>
>>>
>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>
>>>
>>>
>>> *From:* Lian Jiang <jiangok2...@gmail.com>
>>> *Sent:* 27 August 2018 22:42
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>> notebooks to Zeppelin. One issue we came across is that a python script
>>> calling R script does not work in Zeppelin.
>>>
>>>
>>>
>>> %livy2.pyspark
>>>
>>> import os
>>>
>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>
>>> import my
>>>
>>> my.test()
>>>
>>>
>>>
>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>
>>>
>>>
>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>
>>>
>>>
>>> When running this notebook in jupyter, both my.py and myR.r exist in the
>>> same folder. I understand the story changes on hadoop because the scripts
>>> run in containers.
>>>
>>>
>>>
>>> My question:
>>>
>>> Is this scenario supported in zeppelin? How to add a R script into a
>>> python spark context so that the Python script can find the R script?
>>> Appreciate!
>>>
>>>
>>>
>>>
>>>
>>

Re: Python script calls R script in Zeppelin on Hadoop

Reply via email to