Re: Python script calls R script in Zeppelin on Hadoop

Lian Jiang Wed, 29 Aug 2018 19:34:20 -0700

Jeff,

R is installed on namenode and all data nodes. The R packages have been
copied to them all too. I am not sure if an R script launched by
pyspark's subprocess
can access spark context or not. If not, using addFiles to add R packages
into spark context will not help test.r install the packages. Thanks for
clue.




On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zjf...@gmail.com> wrote:

>
> You need to make sure the spark driver machine have this package
> installed. And since you are using yarn-cluster mode via livy, you have to
> install this packages on all nodes because the spark driver could be
> launched in any node of this cluster.
>
>
>
> Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道：
>
>> After calling a sample R script, we found another issue when running a
>> real R script. This R script failed to load changepoint library.
>>
>> I tried:
>>
>> %livy2.sparkr
>> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
>> library(changepoint) // I see "Successfully loaded changepoint package
>> version 2.2.2"
>>
>> %livy2.pyspark
>> from pyspark import SparkFiles
>> import subprocess
>>
>> sc.addFile("hdfs:///user/zeppelin/test.r")
>> testpath = SparkFiles.get('test.r')
>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>> print(stdoutdata)
>>
>> The error: Error in library(changepoint) : there is no package called
>> ‘changepoint’
>>
>> test.r is simply:
>>
>> library(changepoint)
>>
>> Any idea how to make changepoint available for the R script? Thanks.
>>
>>
>>
>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com>
>> wrote:
>>
>>> Thanks Jeff.
>>>
>>> This worked:
>>>
>>> %livy2.pyspark
>>> from pyspark import SparkFiles
>>> import subprocess
>>>
>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>>> testpath = SparkFiles.get('test.r')
>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>> print(stdoutdata)
>>>
>>> Cheers!
>>>
>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>>
>>>> And also you need to make sure you have R installed in all hosts of
>>>> yarn cluster because the driver may run any node of this cluster.
>>>>
>>>>
>>>>
>>>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>>
>>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>>
>>>>> %livy2.pyspark
>>>>> import subprocess
>>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>>> print(stdoutdata)
>>>>>
>>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>>
>>>>>
>>>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>>>> use spark context.
>>>>>
>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>>> hdfs:///user/zeppelin/test.r")
>>>>>
>>>>> Any idea how to make python call R script? Appreciate!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>>> lucas.partri...@ge.com> wrote:
>>>>>
>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>>>> script?
>>>>>>
>>>>>>
>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Lian Jiang <jiangok2...@gmail.com>
>>>>>> *Sent:* 27 August 2018 22:42
>>>>>> *To:* users@zeppelin.apache.org
>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>>>> calling R script does not work in Zeppelin.
>>>>>>
>>>>>>
>>>>>>
>>>>>> %livy2.pyspark
>>>>>>
>>>>>> import os
>>>>>>
>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>>
>>>>>> import my
>>>>>>
>>>>>> my.test()
>>>>>>
>>>>>>
>>>>>>
>>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>>
>>>>>>
>>>>>>
>>>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>>>> the same folder. I understand the story changes on hadoop because the
>>>>>> scripts run in containers.
>>>>>>
>>>>>>
>>>>>>
>>>>>> My question:
>>>>>>
>>>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>>>> python spark context so that the Python script can find the R script?
>>>>>> Appreciate!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Reply via email to