There is some new SparkR functionality coming in Spark 2.0, such as "dapply". You could use SparkR to load a Parquet file and then run "dapply" to apply a function to each partition of a DataFrame.
Info about loading Parquet file: http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources API doc for "dapply": http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html Xinh On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog <sujeet....@gmail.com> wrote: > try Spark pipeRDD's , you can invoke the R script from pipe , push the > stuff you want to do on the Rscript stdin, p > > > On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <gilad.lan...@clicktale.com> > wrote: > >> Hello, >> >> >> >> I want to use R code as part of spark application (the same way I would >> do with Scala/Python). I want to be able to run an R syntax as a map >> function on a big Spark dataframe loaded from a parquet file. >> >> Is this even possible or the only way to use R is as part of RStudio >> orchestration of our Spark cluster? >> >> >> >> Thanks for the help! >> >> >> >> Gilad >> >> >> > >