Re: Python script calls R script in Zeppelin on Hadoop

Lian Jiang Wed, 29 Aug 2018 10:46:33 -0700

After calling a sample R script, we found another issue when running a real
R script. This R script failed to load changepoint library.


I tried:

%livy2.sparkr
install.packages("changepoint", repos="file:///mnt/data/tmp/r")
library(changepoint) // I see "Successfully loaded changepoint package
version 2.2.2"

%livy2.pyspark
from pyspark import SparkFiles
import subprocess

sc.addFile("hdfs:///user/zeppelin/test.r")
testpath = SparkFiles.get('test.r')
stdoutdata = subprocess.getoutput("Rscript " + testpath)
print(stdoutdata)

The error: Error in library(changepoint) : there is no package called
‘changepoint’

test.r is simply:

library(changepoint)

Any idea how to make changepoint available for the R script? Thanks.



On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> wrote:

> Thanks Jeff.
>
> This worked:
>
> %livy2.pyspark
> from pyspark import SparkFiles
> import subprocess
>
> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
> testpath = SparkFiles.get('test.r')
> stdoutdata = subprocess.getoutput("Rscript " + testpath)
> print(stdoutdata)
>
> Cheers!
>
> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Do you run it under yarn-cluster mode ? Then you must ensure your rscript
>> shipped to that driver (via sc.addFile or setting livy.spark.files)
>>
>> And also you need to make sure you have R installed in all hosts of yarn
>> cluster because the driver may run any node of this cluster.
>>
>>
>>
>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>
>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>
>>> %livy2.pyspark
>>> import subprocess
>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>> print(stdoutdata)
>>>
>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>
>>>
>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>> use spark context.
>>>
>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>> hdfs:///user/zeppelin/test.r")
>>>
>>> Any idea how to make python call R script? Appreciate!
>>>
>>>
>>>
>>>
>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>> lucas.partri...@ge.com> wrote:
>>>
>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>> script?
>>>>
>>>>
>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>
>>>>
>>>>
>>>> *From:* Lian Jiang <jiangok2...@gmail.com>
>>>> *Sent:* 27 August 2018 22:42
>>>> *To:* users@zeppelin.apache.org
>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>> calling R script does not work in Zeppelin.
>>>>
>>>>
>>>>
>>>> %livy2.pyspark
>>>>
>>>> import os
>>>>
>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>
>>>> import my
>>>>
>>>> my.test()
>>>>
>>>>
>>>>
>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>
>>>>
>>>>
>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>
>>>>
>>>>
>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>> the same folder. I understand the story changes on hadoop because the
>>>> scripts run in containers.
>>>>
>>>>
>>>>
>>>> My question:
>>>>
>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>> python spark context so that the Python script can find the R script?
>>>> Appreciate!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: Python script calls R script in Zeppelin on Hadoop

Reply via email to