Re: Python script calls R script in Zeppelin on Hadoop

2018-08-31 Thread Lian Jiang
Thanks Jeff. Problem solved by installing the R packages into /usr/lib64/R/library (the default lib path) on each datanode. Your clue help! On Wed, Aug 29, 2018 at 7:40 PM Jeff Zhang wrote: > > I am not sure what's wrong. maybe you can ssh to that machine and run this > r script manually first

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-29 Thread Jeff Zhang
I am not sure what's wrong. maybe you can ssh to that machine and run this r script manually first to verify what's wrong. Lian Jiang 于2018年8月30日周四 上午10:34写道: > Jeff, > > R is installed on namenode and all data nodes. The R packages have been > copied to them all too. I am not sure if an R scri

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-29 Thread Lian Jiang
Jeff, R is installed on namenode and all data nodes. The R packages have been copied to them all too. I am not sure if an R script launched by pyspark's subprocess can access spark context or not. If not, using addFiles to add R packages into spark context will not help test.r install the packages

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-29 Thread Jeff Zhang
You need to make sure the spark driver machine have this package installed. And since you are using yarn-cluster mode via livy, you have to install this packages on all nodes because the spark driver could be launched in any node of this cluster. Lian Jiang 于2018年8月30日周四 上午1:46写道: > After calli

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-29 Thread Lian Jiang
After calling a sample R script, we found another issue when running a real R script. This R script failed to load changepoint library. I tried: %livy2.sparkr install.packages("changepoint", repos="file:///mnt/data/tmp/r") library(changepoint) // I see "Successfully loaded changepoint package ver

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-28 Thread Lian Jiang
Thanks Jeff. This worked: %livy2.pyspark from pyspark import SparkFiles import subprocess sc.addFile("hdfs:///user/zeppelin/ocic/test.r") testpath = SparkFiles.get('test.r') stdoutdata = subprocess.getoutput("Rscript " + testpath) print(stdoutdata) Cheers! On Tue, Aug 28, 2018 at 6:09 PM Jeff

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-28 Thread Jeff Zhang
Do you run it under yarn-cluster mode ? Then you must ensure your rscript shipped to that driver (via sc.addFile or setting livy.spark.files) And also you need to make sure you have R installed in all hosts of yarn cluster because the driver may run any node of this cluster. Lian Jiang 于2018年8月

Re: Python script calls R script in Zeppelin on Hadoop

2018-08-28 Thread Lian Jiang
Thanks Lucas. We tried and got the same error. Below is the code: %livy2.pyspark import subprocess sc.addFile("hdfs:///user/zeppelin/test.r") stdoutdata = subprocess.getoutput("Rscript test.r") print(stdoutdata) Fatal error: cannot open file 'test.r': No such file or directory sc.addFile adds t

RE: Python script calls R script in Zeppelin on Hadoop

2018-08-28 Thread Partridge, Lucas (GE Aviation)
Have you tried SparkContext.addFile() (not addPyFile()) to add your R script? https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile From: Lian Jiang Sent: 27 August 2018 22:42 To: users@zeppelin.apache.org Subject: EXT: Python script calls R script in Zeppelin o