After calling a sample R script, we found another issue when running a real R script. This R script failed to load changepoint library.
I tried: %livy2.sparkr install.packages("changepoint", repos="file:///mnt/data/tmp/r") library(changepoint) // I see "Successfully loaded changepoint package version 2.2.2" %livy2.pyspark from pyspark import SparkFiles import subprocess sc.addFile("hdfs:///user/zeppelin/test.r") testpath = SparkFiles.get('test.r') stdoutdata = subprocess.getoutput("Rscript " + testpath) print(stdoutdata) The error: Error in library(changepoint) : there is no package called ‘changepoint’ test.r is simply: library(changepoint) Any idea how to make changepoint available for the R script? Thanks. On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Thanks Jeff. > > This worked: > > %livy2.pyspark > from pyspark import SparkFiles > import subprocess > > sc.addFile("hdfs:///user/zeppelin/ocic/test.r") > testpath = SparkFiles.get('test.r') > stdoutdata = subprocess.getoutput("Rscript " + testpath) > print(stdoutdata) > > Cheers! > > On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: > >> Do you run it under yarn-cluster mode ? Then you must ensure your rscript >> shipped to that driver (via sc.addFile or setting livy.spark.files) >> >> And also you need to make sure you have R installed in all hosts of yarn >> cluster because the driver may run any node of this cluster. >> >> >> >> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: >> >>> Thanks Lucas. We tried and got the same error. Below is the code: >>> >>> %livy2.pyspark >>> import subprocess >>> sc.addFile("hdfs:///user/zeppelin/test.r") >>> stdoutdata = subprocess.getoutput("Rscript test.r") >>> print(stdoutdata) >>> >>> Fatal error: cannot open file 'test.r': No such file or directory >>> >>> >>> sc.addFile adds test.r to spark context. However, subprocess does not >>> use spark context. >>> >>> Hdfs path does not work either: subprocess.getoutput("Rscript >>> hdfs:///user/zeppelin/test.r") >>> >>> Any idea how to make python call R script? Appreciate! >>> >>> >>> >>> >>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >>> lucas.partri...@ge.com> wrote: >>> >>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R >>>> script? >>>> >>>> >>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>>> >>>> >>>> >>>> *From:* Lian Jiang <jiangok2...@gmail.com> >>>> *Sent:* 27 August 2018 22:42 >>>> *To:* users@zeppelin.apache.org >>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter >>>> notebooks to Zeppelin. One issue we came across is that a python script >>>> calling R script does not work in Zeppelin. >>>> >>>> >>>> >>>> %livy2.pyspark >>>> >>>> import os >>>> >>>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>>> >>>> import my >>>> >>>> my.test() >>>> >>>> >>>> >>>> my.test() calls R script like: ['Rscript', 'myR.r'] >>>> >>>> >>>> >>>> Fatal error: cannot open file 'myR.r': No such file or directory >>>> >>>> >>>> >>>> When running this notebook in jupyter, both my.py and myR.r exist in >>>> the same folder. I understand the story changes on hadoop because the >>>> scripts run in containers. >>>> >>>> >>>> >>>> My question: >>>> >>>> Is this scenario supported in zeppelin? How to add a R script into a >>>> python spark context so that the Python script can find the R script? >>>> Appreciate! >>>> >>>> >>>> >>>> >>>> >>>