Great, I'll be on the call. The first steps I took today with the automatically generated bindings from the C++ source seem promising. Much more work is required to make it usable though.
On Mon, Jul 24, 2017 at 9:00 PM, Kevin Moore <ke...@quiltdata.io> wrote: > A group of Quilt users and team members interested in R is planning a short > call to get the ball rolling on R bindings for Arrow (and Quilt) tomorrow > at 4PM Pacific. We'd love to have anyone who's interested from this list > join us in the hangout: > https://hangouts.google.com/hangouts/_/quiltdata.io/aneesh?authuser=1 > > Thanks, > > Kevin > > ---- > Kevin Moore > CEO, Quilt Data, Inc. > ke...@quiltdata.io | LinkedIn <https://www.linkedin.com/in/kevinemoore/> > (415) 497-7895 > > > Manage Data like Code > quiltdata.com > > On Mon, Jul 24, 2017 at 7:58 AM, Wes McKinney <wesmck...@gmail.com> wrote: > > > + Hadley > > > > On Fri, Jul 21, 2017 at 2:04 PM, Bryan Cutler <cutl...@gmail.com> wrote: > > > Thanks Clark. I know that SparkR would benefit a lot from Arrow > bindings > > > and many people would like to see that, but to my knowledge no one has > > > started working on this yet. Please keep us updated with what you > find! > > > > > > Bryan > > > > > > On Fri, Jul 21, 2017 at 9:15 AM, Clark Fitzgerald < > clarkfi...@gmail.com> > > > wrote: > > > > > >> Regarding the R Consortium, the Distributed Computing Working Group > led > > by > > >> Michael Lawrence would be interested in this. It would be nice to go > to > > >> them with some working examples and use cases. > > >> > > >> Next week I will start looking into R / Arrow bindings. A couple other > > >> people at the UC Davis Data Science Initiative have expressed interest > > as > > >> well. I'll post updates here. > > >> > > >> On Wed, Jul 19, 2017 at 5:01 PM, Dean Chen <d...@dv01.co> wrote: > > >> > > >> > Sounds good, will get a thread going there. > > >> > > > >> > On Wed, Jul 19, 2017 at 6:02 PM Wes McKinney <wesmck...@gmail.com> > > >> wrote: > > >> > > > >> > > Especially with Arrow support landing in Spark (SPARK-13534), it > > would > > >> > > be helpful to combine efforts between Python and R on this front. > I > > >> > > also have a long list of improvements to the Feather format that > > will > > >> > > be substantially simpler once library(feather) is depending on the > > >> > > main Arrow libraries. > > >> > > > > >> > > I suggest you reach out to members of the R community directly on > > >> > > public forums about development help / advice and soliciting > > >> > > collaboration. There are other R venues where you can describe > your > > >> > > use cases, like the R Consortium and its subcommittees: > > >> > > https://www.r-consortium.org/. I would go directly to the mailing > > >> > > lists and see if there is anyone who would like to get involved. > > It's > > >> > > more likely that you'll get attention on this problem in the R > > mailing > > >> > > lists than on the Arrow mailing list due to the chicken-and-egg > > >> > > aspect. > > >> > > > > >> > > As a side note, my opinion is that shared storage, memory formats, > > and > > >> > > computing libraries (e.g. native C++ libraries targeting Arrow > > memory) > > >> > > are going to be more and more important to the R / Python / Julia > > >> > > communities (and beyond -- Kou has been developing Arrow > interfaces > > >> > > for Ruby, which has not traditionally had a large data science > > >> > > community) as time passes. I would like to personally do more on > > the R > > >> > > side but I simply don't have the bandwidth to take responsibility > > for > > >> > > another major component, especially not in an unfamiliar software > > >> > > development stack. > > >> > > > > >> > > Let me know how I can help, and if there are R mailing list > > >> > > discussions where we (the Arrow developers) can chime in please > > alert > > >> > > us to them here. > > >> > > > > >> > > - Wes > > >> > > > > >> > > On Wed, Jul 19, 2017 at 5:29 PM, Dean Chen <d...@dv01.co> wrote: > > >> > > > I also sent a note about it to the dev list a month ago. Still > > have a > > >> > > huge > > >> > > > internal need and interested in helping push this along where we > > can. > > >> > > > Unfortunately, our team is more focused around Spark and doesn't > > have > > >> > > much > > >> > > > experience working with the R community. > > >> > > > > > >> > > > On Wed, Jul 19, 2017 at 1:44 PM Clark Fitzgerald < > > >> clarkfi...@gmail.com > > >> > > > > >> > > > wrote: > > >> > > > > > >> > > >> Hello all, > > >> > > >> > > >> > > >> I saw the notes come through from today's call: > > >> > > >> > > >> > > >> > * R Arrow Bindings? > > >> > > >> > - Find use cases within the R community, contributors needed > > >> > > >> > - R Feather bindings a useful starting point > > >> > > >> > > >> > > >> This year I've been working on parallel R on datasets in the > > 100+ GB > > >> > > range, > > >> > > >> and have found that loading and saving data from text files is > a > > >> real > > >> > > >> bottleneck. Another consideration is breaking the data up into > > >> chunks > > >> > > for > > >> > > >> parallel processing while maintaining metadata and overall > > >> structure. > > >> > So > > >> > > >> I've been watching Parquet and Arrow. > > >> > > >> > > >> > > >> Specifically here are two use cases in R where Arrow / Parquet > > could > > >> > be > > >> > > >> helpful: > > >> > > >> > > >> > > >> - Splitting up a large data set into pieces which fit > > comfortably in > > >> > > memory > > >> > > >> then applying normal R functions to each piece. Basically GROUP > > BY. > > >> > > >> - Matloff's Software Alchemy, statistical averaging based on > > >> > independent > > >> > > >> chunks of data. This requires rows to be randomly assigned to > > >> chunks. > > >> > > >> > > >> > > >> Another option besides starting from the R Feather bindings is > to > > >> > start > > >> > > >> with an automatically generated set of bindings: > > >> > > >> https://github.com/duncantl/RCodeGen > > >> > > >> > > >> > > >> Best, > > >> > > >> Clark Fitzgerald > > >> > > >> > > >> > > > -- > > >> > > > VP of Engineering - dv01, Featured in Forbes Fintech 50 For 2016 > > >> > > > <http://www.forbes.com/fintech/2016/#310668d56680> > > >> > > > 915 Broadway | Suite 502 | New York, NY 10010 > > >> > > > (646)-838-2310 <(646)%20838-2310> > > >> > > > d...@dv01.co | www.dv01.co > > >> > > > > >> > -- > > >> > VP of Engineering - dv01, Featured in Forbes Fintech 50 For 2016 > > >> > <http://www.forbes.com/fintech/2016/#310668d56680> > > >> > 915 Broadway | Suite 502 | New York, NY 10010 > > >> > (646)-838-2310 > > >> > d...@dv01.co | www.dv01.co > > >> > > > >> > > >