You need to make sure the spark driver machine have this package installed. And since you are using yarn-cluster mode via livy, you have to install this packages on all nodes because the spark driver could be launched in any node of this cluster.
Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道: > After calling a sample R script, we found another issue when running a > real R script. This R script failed to load changepoint library. > > I tried: > > %livy2.sparkr > install.packages("changepoint", repos="file:///mnt/data/tmp/r") > library(changepoint) // I see "Successfully loaded changepoint package > version 2.2.2" > > %livy2.pyspark > from pyspark import SparkFiles > import subprocess > > sc.addFile("hdfs:///user/zeppelin/test.r") > testpath = SparkFiles.get('test.r') > stdoutdata = subprocess.getoutput("Rscript " + testpath) > print(stdoutdata) > > The error: Error in library(changepoint) : there is no package called > ‘changepoint’ > > test.r is simply: > > library(changepoint) > > Any idea how to make changepoint available for the R script? Thanks. > > > > On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> wrote: > >> Thanks Jeff. >> >> This worked: >> >> %livy2.pyspark >> from pyspark import SparkFiles >> import subprocess >> >> sc.addFile("hdfs:///user/zeppelin/ocic/test.r") >> testpath = SparkFiles.get('test.r') >> stdoutdata = subprocess.getoutput("Rscript " + testpath) >> print(stdoutdata) >> >> Cheers! >> >> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: >> >>> Do you run it under yarn-cluster mode ? Then you must ensure your >>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files) >>> >>> And also you need to make sure you have R installed in all hosts of yarn >>> cluster because the driver may run any node of this cluster. >>> >>> >>> >>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: >>> >>>> Thanks Lucas. We tried and got the same error. Below is the code: >>>> >>>> %livy2.pyspark >>>> import subprocess >>>> sc.addFile("hdfs:///user/zeppelin/test.r") >>>> stdoutdata = subprocess.getoutput("Rscript test.r") >>>> print(stdoutdata) >>>> >>>> Fatal error: cannot open file 'test.r': No such file or directory >>>> >>>> >>>> sc.addFile adds test.r to spark context. However, subprocess does not >>>> use spark context. >>>> >>>> Hdfs path does not work either: subprocess.getoutput("Rscript >>>> hdfs:///user/zeppelin/test.r") >>>> >>>> Any idea how to make python call R script? Appreciate! >>>> >>>> >>>> >>>> >>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >>>> lucas.partri...@ge.com> wrote: >>>> >>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R >>>>> script? >>>>> >>>>> >>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>>>> >>>>> >>>>> >>>>> *From:* Lian Jiang <jiangok2...@gmail.com> >>>>> *Sent:* 27 August 2018 22:42 >>>>> *To:* users@zeppelin.apache.org >>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter >>>>> notebooks to Zeppelin. One issue we came across is that a python script >>>>> calling R script does not work in Zeppelin. >>>>> >>>>> >>>>> >>>>> %livy2.pyspark >>>>> >>>>> import os >>>>> >>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>>>> >>>>> import my >>>>> >>>>> my.test() >>>>> >>>>> >>>>> >>>>> my.test() calls R script like: ['Rscript', 'myR.r'] >>>>> >>>>> >>>>> >>>>> Fatal error: cannot open file 'myR.r': No such file or directory >>>>> >>>>> >>>>> >>>>> When running this notebook in jupyter, both my.py and myR.r exist in >>>>> the same folder. I understand the story changes on hadoop because the >>>>> scripts run in containers. >>>>> >>>>> >>>>> >>>>> My question: >>>>> >>>>> Is this scenario supported in zeppelin? How to add a R script into a >>>>> python spark context so that the Python script can find the R script? >>>>> Appreciate! >>>>> >>>>> >>>>> >>>>> >>>>> >>>>