Re: Improve SparkR collect performance with Arrow

2017-05-15 Thread Felix Cheung
I can try to help. _ From: Wes McKinney Sent: Monday, May 15, 2017 12:49 PM Subject: Re: Improve SparkR collect performance with Arrow To: Dirk Eddelbuettel , , Jim Hester , Hadley Wickham , Kevin Ushey Adding Hadley and others to the conversation to advise on

Re: Improve SparkR collect performance with Arrow

2017-05-15 Thread Wes McKinney
Adding Hadley and others to the conversation to advise on the best path forward. I am happy to help with maintenance of the C++ code. For example, if there are API changes that affect the Rcpp bindings, I would help fix them. We have GLib-based C and Cython bindings (which is like Rcpp for Python)

Re: Improve SparkR collect performance with Arrow

2017-05-15 Thread Dean Chen
Hi Wes, We can work with the Spark community on the Spark/SparkR integration. Also happy to help with migrating the R package from Feather in to Arrow. Have anyone in mind to manage the R/Rcpp binding issues? I reviewed the R and cpp files in https://github.com/wesm/feather/tree/master/R and we

Re: Improve SparkR collect performance with Arrow

2017-05-14 Thread Wes McKinney
Note I just opened https://github.com/wesm/feather/pull/297 which deletes all of the Feather Python code (using pyarrow as a dependency). On Sun, May 14, 2017 at 2:44 PM, Wes McKinney wrote: > hi Dean, > > In Arrow 0.3 we incorporated the C++ and Python code from wesm/feather > into the Arrow re

Re: Improve SparkR collect performance with Arrow

2017-05-14 Thread Wes McKinney
hi Dean, In Arrow 0.3 we incorporated the C++ and Python code from wesm/feather into the Arrow repo. The Feather format is a simplified version of the Arrow IPC format (which has file/batch and stream flavors), so the ideal approach would be to move the Feather R/Rcpp wrapper code into the Arrow c