Re: Python script calls R script in Zeppelin on Hadoop

Jeff Zhang Wed, 29 Aug 2018 19:25:16 -0700

You need to make sure the spark driver machine have this package installed.
And since you are using yarn-cluster mode via livy, you have to install
this packages on all nodes because the spark driver could be launched in
any node of this cluster.




Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道：

> After calling a sample R script, we found another issue when running a
> real R script. This R script failed to load changepoint library.
>
> I tried:
>
> %livy2.sparkr
> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
> library(changepoint) // I see "Successfully loaded changepoint package
> version 2.2.2"
>
> %livy2.pyspark
> from pyspark import SparkFiles
> import subprocess
>
> sc.addFile("hdfs:///user/zeppelin/test.r")
> testpath = SparkFiles.get('test.r')
> stdoutdata = subprocess.getoutput("Rscript " + testpath)
> print(stdoutdata)
>
> The error: Error in library(changepoint) : there is no package called
> ‘changepoint’
>
> test.r is simply:
>
> library(changepoint)
>
> Any idea how to make changepoint available for the R script? Thanks.
>
>
>
> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> wrote:
>
>> Thanks Jeff.
>>
>> This worked:
>>
>> %livy2.pyspark
>> from pyspark import SparkFiles
>> import subprocess
>>
>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>> testpath = SparkFiles.get('test.r')
>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>> print(stdoutdata)
>>
>> Cheers!
>>
>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>
>>> And also you need to make sure you have R installed in all hosts of yarn
>>> cluster because the driver may run any node of this cluster.
>>>
>>>
>>>
>>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>
>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>
>>>> %livy2.pyspark
>>>> import subprocess
>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>> print(stdoutdata)
>>>>
>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>
>>>>
>>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>>> use spark context.
>>>>
>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>> hdfs:///user/zeppelin/test.r")
>>>>
>>>> Any idea how to make python call R script? Appreciate!
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>> lucas.partri...@ge.com> wrote:
>>>>
>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>>> script?
>>>>>
>>>>>
>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>
>>>>>
>>>>>
>>>>> *From:* Lian Jiang <jiangok2...@gmail.com>
>>>>> *Sent:* 27 August 2018 22:42
>>>>> *To:* users@zeppelin.apache.org
>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>>> calling R script does not work in Zeppelin.
>>>>>
>>>>>
>>>>>
>>>>> %livy2.pyspark
>>>>>
>>>>> import os
>>>>>
>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>
>>>>> import my
>>>>>
>>>>> my.test()
>>>>>
>>>>>
>>>>>
>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>
>>>>>
>>>>>
>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>
>>>>>
>>>>>
>>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>>> the same folder. I understand the story changes on hadoop because the
>>>>> scripts run in containers.
>>>>>
>>>>>
>>>>>
>>>>> My question:
>>>>>
>>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>>> python spark context so that the Python script can find the R script?
>>>>> Appreciate!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Reply via email to