Re: Support R in Spark

2014-09-19 Thread oppokui
Thanks, Shivaram. Kui > On Sep 19, 2014, at 12:58 AM, Shivaram Venkataraman > wrote: > > As R is single-threaded, SparkR launches one R process per-executor on > the worker side. > > Thanks > Shivaram > > On Thu, Sep 18, 2014 at 7:49 AM, oppokui wrote: >> Shivaram, >> >> As I know, SparkR

Re: Support R in Spark

2014-09-18 Thread Shivaram Venkataraman
As R is single-threaded, SparkR launches one R process per-executor on the worker side. Thanks Shivaram On Thu, Sep 18, 2014 at 7:49 AM, oppokui wrote: > Shivaram, > > As I know, SparkR used rJava package. In work node, spark code will execute R > code by launching R process and send/receive by

Re: Support R in Spark

2014-09-18 Thread oppokui
Shivaram, As I know, SparkR used rJava package. In work node, spark code will execute R code by launching R process and send/receive byte array. I have a question on when to launch R process. R process is per Work process, or per executor thread, or per each RDD processing? Thanks and Regards

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, sorry about that. That link you mentioned is probably the one for the products. We don't have one pointing from adatao.com to ddf.io; maybe we'll add it. As for access to the code base itself, I think the team has already created a GitHub repo for it, and should open it up within a few wee

Re: Support R in Spark

2014-09-06 Thread oppokui
Thanks, Christopher. I saw it before, it is amazing. Last time I try to download it from adatao, but no response after filling the table. How can I download it or its source code? What is the license? Kui > On Sep 6, 2014, at 8:08 PM, Christopher Nguyen wrote: > > Hi Kui, > > DDF (open sour

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, DDF (open sourced) also aims to do something similar, adding RDBMS idioms, and is already implemented on top of Spark. One philosophy is that the DDF API aggressively hides the notion of parallel datasets, exposing only (mutable) tables to users, on which they can apply R and other famili

Re: Support R in Spark

2014-09-06 Thread oppokui
Cool! It is a very good news. Can’t wait for it. Kui > On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman > wrote: > > Thanks Kui. SparkR is a pretty young project, but there are a bunch of > things we are working on. One of the main features is to expose a data > frame API (https://sparkr.atl

Re: Support R in Spark

2014-09-04 Thread Shivaram Venkataraman
Thanks Kui. SparkR is a pretty young project, but there are a bunch of things we are working on. One of the main features is to expose a data frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will be integrating this with Spark's MLLib. At a high-level this will allow R users to use

Re: Support R in Spark

2014-09-03 Thread oppokui
Thanks, Shivaram. No specific use case yet. We try to use R in our project as data scientest are all knowing R. We had a concern that how R handles the mass data. Spark does a better work on big data area, and Spark ML is focusing on predictive analysis area. Then we are thinking whether we ca

Support R in Spark

2014-09-03 Thread oppokui
Does spark ML team have plan to support R script natively? There is a SparkR project, but not from spark team. Spark ML used netlib-java to talk with native fortran routines or use NumPy, why not try to use R in some sense. R had lot of useful packages. If spark ML team can include R support, i