Thanks Jeff. This worked:
%livy2.pyspark from pyspark import SparkFiles import subprocess sc.addFile("hdfs:///user/zeppelin/ocic/test.r") testpath = SparkFiles.get('test.r') stdoutdata = subprocess.getoutput("Rscript " + testpath) print(stdoutdata) Cheers! On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: > Do you run it under yarn-cluster mode ? Then you must ensure your rscript > shipped to that driver (via sc.addFile or setting livy.spark.files) > > And also you need to make sure you have R installed in all hosts of yarn > cluster because the driver may run any node of this cluster. > > > > Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: > >> Thanks Lucas. We tried and got the same error. Below is the code: >> >> %livy2.pyspark >> import subprocess >> sc.addFile("hdfs:///user/zeppelin/test.r") >> stdoutdata = subprocess.getoutput("Rscript test.r") >> print(stdoutdata) >> >> Fatal error: cannot open file 'test.r': No such file or directory >> >> >> sc.addFile adds test.r to spark context. However, subprocess does not use >> spark context. >> >> Hdfs path does not work either: subprocess.getoutput("Rscript >> hdfs:///user/zeppelin/test.r") >> >> Any idea how to make python call R script? Appreciate! >> >> >> >> >> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >> lucas.partri...@ge.com> wrote: >> >>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R >>> script? >>> >>> >>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>> >>> >>> >>> *From:* Lian Jiang <jiangok2...@gmail.com> >>> *Sent:* 27 August 2018 22:42 >>> *To:* users@zeppelin.apache.org >>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>> >>> >>> >>> Hi, >>> >>> >>> >>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter >>> notebooks to Zeppelin. One issue we came across is that a python script >>> calling R script does not work in Zeppelin. >>> >>> >>> >>> %livy2.pyspark >>> >>> import os >>> >>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>> >>> import my >>> >>> my.test() >>> >>> >>> >>> my.test() calls R script like: ['Rscript', 'myR.r'] >>> >>> >>> >>> Fatal error: cannot open file 'myR.r': No such file or directory >>> >>> >>> >>> When running this notebook in jupyter, both my.py and myR.r exist in the >>> same folder. I understand the story changes on hadoop because the scripts >>> run in containers. >>> >>> >>> >>> My question: >>> >>> Is this scenario supported in zeppelin? How to add a R script into a >>> python spark context so that the Python script can find the R script? >>> Appreciate! >>> >>> >>> >>> >>> >>