Jeff, R is installed on namenode and all data nodes. The R packages have been copied to them all too. I am not sure if an R script launched by pyspark's subprocess can access spark context or not. If not, using addFiles to add R packages into spark context will not help test.r install the packages. Thanks for clue.
On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zjf...@gmail.com> wrote: > > You need to make sure the spark driver machine have this package > installed. And since you are using yarn-cluster mode via livy, you have to > install this packages on all nodes because the spark driver could be > launched in any node of this cluster. > > > > Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道: > >> After calling a sample R script, we found another issue when running a >> real R script. This R script failed to load changepoint library. >> >> I tried: >> >> %livy2.sparkr >> install.packages("changepoint", repos="file:///mnt/data/tmp/r") >> library(changepoint) // I see "Successfully loaded changepoint package >> version 2.2.2" >> >> %livy2.pyspark >> from pyspark import SparkFiles >> import subprocess >> >> sc.addFile("hdfs:///user/zeppelin/test.r") >> testpath = SparkFiles.get('test.r') >> stdoutdata = subprocess.getoutput("Rscript " + testpath) >> print(stdoutdata) >> >> The error: Error in library(changepoint) : there is no package called >> ‘changepoint’ >> >> test.r is simply: >> >> library(changepoint) >> >> Any idea how to make changepoint available for the R script? Thanks. >> >> >> >> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> >> wrote: >> >>> Thanks Jeff. >>> >>> This worked: >>> >>> %livy2.pyspark >>> from pyspark import SparkFiles >>> import subprocess >>> >>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r") >>> testpath = SparkFiles.get('test.r') >>> stdoutdata = subprocess.getoutput("Rscript " + testpath) >>> print(stdoutdata) >>> >>> Cheers! >>> >>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> Do you run it under yarn-cluster mode ? Then you must ensure your >>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files) >>>> >>>> And also you need to make sure you have R installed in all hosts of >>>> yarn cluster because the driver may run any node of this cluster. >>>> >>>> >>>> >>>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: >>>> >>>>> Thanks Lucas. We tried and got the same error. Below is the code: >>>>> >>>>> %livy2.pyspark >>>>> import subprocess >>>>> sc.addFile("hdfs:///user/zeppelin/test.r") >>>>> stdoutdata = subprocess.getoutput("Rscript test.r") >>>>> print(stdoutdata) >>>>> >>>>> Fatal error: cannot open file 'test.r': No such file or directory >>>>> >>>>> >>>>> sc.addFile adds test.r to spark context. However, subprocess does not >>>>> use spark context. >>>>> >>>>> Hdfs path does not work either: subprocess.getoutput("Rscript >>>>> hdfs:///user/zeppelin/test.r") >>>>> >>>>> Any idea how to make python call R script? Appreciate! >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >>>>> lucas.partri...@ge.com> wrote: >>>>> >>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R >>>>>> script? >>>>>> >>>>>> >>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>>>>> >>>>>> >>>>>> >>>>>> *From:* Lian Jiang <jiangok2...@gmail.com> >>>>>> *Sent:* 27 August 2018 22:42 >>>>>> *To:* users@zeppelin.apache.org >>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter >>>>>> notebooks to Zeppelin. One issue we came across is that a python script >>>>>> calling R script does not work in Zeppelin. >>>>>> >>>>>> >>>>>> >>>>>> %livy2.pyspark >>>>>> >>>>>> import os >>>>>> >>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>>>>> >>>>>> import my >>>>>> >>>>>> my.test() >>>>>> >>>>>> >>>>>> >>>>>> my.test() calls R script like: ['Rscript', 'myR.r'] >>>>>> >>>>>> >>>>>> >>>>>> Fatal error: cannot open file 'myR.r': No such file or directory >>>>>> >>>>>> >>>>>> >>>>>> When running this notebook in jupyter, both my.py and myR.r exist in >>>>>> the same folder. I understand the story changes on hadoop because the >>>>>> scripts run in containers. >>>>>> >>>>>> >>>>>> >>>>>> My question: >>>>>> >>>>>> Is this scenario supported in zeppelin? How to add a R script into a >>>>>> python spark context so that the Python script can find the R script? >>>>>> Appreciate! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>