Thanks Jeff. Problem solved by installing the R packages into /usr/lib64/R/library (the default lib path) on each datanode. Your clue help!
On Wed, Aug 29, 2018 at 7:40 PM Jeff Zhang <zjf...@gmail.com> wrote: > > I am not sure what's wrong. maybe you can ssh to that machine and run this > r script manually first to verify what's wrong. > > > > Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午10:34写道: > >> Jeff, >> >> R is installed on namenode and all data nodes. The R packages have been >> copied to them all too. I am not sure if an R script launched by pyspark's >> subprocess >> can access spark context or not. If not, using addFiles to add R packages >> into spark context will not help test.r install the packages. Thanks for >> clue. >> >> >> >> On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> You need to make sure the spark driver machine have this package >>> installed. And since you are using yarn-cluster mode via livy, you have to >>> install this packages on all nodes because the spark driver could be >>> launched in any node of this cluster. >>> >>> >>> >>> Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道: >>> >>>> After calling a sample R script, we found another issue when running a >>>> real R script. This R script failed to load changepoint library. >>>> >>>> I tried: >>>> >>>> %livy2.sparkr >>>> install.packages("changepoint", repos="file:///mnt/data/tmp/r") >>>> library(changepoint) // I see "Successfully loaded changepoint package >>>> version 2.2.2" >>>> >>>> %livy2.pyspark >>>> from pyspark import SparkFiles >>>> import subprocess >>>> >>>> sc.addFile("hdfs:///user/zeppelin/test.r") >>>> testpath = SparkFiles.get('test.r') >>>> stdoutdata = subprocess.getoutput("Rscript " + testpath) >>>> print(stdoutdata) >>>> >>>> The error: Error in library(changepoint) : there is no package called >>>> ‘changepoint’ >>>> >>>> test.r is simply: >>>> >>>> library(changepoint) >>>> >>>> Any idea how to make changepoint available for the R script? Thanks. >>>> >>>> >>>> >>>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Jeff. >>>>> >>>>> This worked: >>>>> >>>>> %livy2.pyspark >>>>> from pyspark import SparkFiles >>>>> import subprocess >>>>> >>>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r") >>>>> testpath = SparkFiles.get('test.r') >>>>> stdoutdata = subprocess.getoutput("Rscript " + testpath) >>>>> print(stdoutdata) >>>>> >>>>> Cheers! >>>>> >>>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>>> >>>>>> Do you run it under yarn-cluster mode ? Then you must ensure your >>>>>> rscript shipped to that driver (via sc.addFile or setting >>>>>> livy.spark.files) >>>>>> >>>>>> And also you need to make sure you have R installed in all hosts of >>>>>> yarn cluster because the driver may run any node of this cluster. >>>>>> >>>>>> >>>>>> >>>>>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: >>>>>> >>>>>>> Thanks Lucas. We tried and got the same error. Below is the code: >>>>>>> >>>>>>> %livy2.pyspark >>>>>>> import subprocess >>>>>>> sc.addFile("hdfs:///user/zeppelin/test.r") >>>>>>> stdoutdata = subprocess.getoutput("Rscript test.r") >>>>>>> print(stdoutdata) >>>>>>> >>>>>>> Fatal error: cannot open file 'test.r': No such file or directory >>>>>>> >>>>>>> >>>>>>> sc.addFile adds test.r to spark context. However, subprocess does >>>>>>> not use spark context. >>>>>>> >>>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript >>>>>>> hdfs:///user/zeppelin/test.r") >>>>>>> >>>>>>> Any idea how to make python call R script? Appreciate! >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >>>>>>> lucas.partri...@ge.com> wrote: >>>>>>> >>>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your >>>>>>>> R script? >>>>>>>> >>>>>>>> >>>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *From:* Lian Jiang <jiangok2...@gmail.com> >>>>>>>> *Sent:* 27 August 2018 22:42 >>>>>>>> *To:* users@zeppelin.apache.org >>>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating >>>>>>>> Jupyter notebooks to Zeppelin. One issue we came across is that a >>>>>>>> python >>>>>>>> script calling R script does not work in Zeppelin. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> %livy2.pyspark >>>>>>>> >>>>>>>> import os >>>>>>>> >>>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>>>>>>> >>>>>>>> import my >>>>>>>> >>>>>>>> my.test() >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> my.test() calls R script like: ['Rscript', 'myR.r'] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> When running this notebook in jupyter, both my.py and myR.r exist >>>>>>>> in the same folder. I understand the story changes on hadoop because >>>>>>>> the >>>>>>>> scripts run in containers. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> My question: >>>>>>>> >>>>>>>> Is this scenario supported in zeppelin? How to add a R script into >>>>>>>> a python spark context so that the Python script can find the R script? >>>>>>>> Appreciate! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>