Hello all, I saw the notes come through from today's call:
> * R Arrow Bindings? > - Find use cases within the R community, contributors needed > - R Feather bindings a useful starting point This year I've been working on parallel R on datasets in the 100+ GB range, and have found that loading and saving data from text files is a real bottleneck. Another consideration is breaking the data up into chunks for parallel processing while maintaining metadata and overall structure. So I've been watching Parquet and Arrow. Specifically here are two use cases in R where Arrow / Parquet could be helpful: - Splitting up a large data set into pieces which fit comfortably in memory then applying normal R functions to each piece. Basically GROUP BY. - Matloff's Software Alchemy, statistical averaging based on independent chunks of data. This requires rows to be randomly assigned to chunks. Another option besides starting from the R Feather bindings is to start with an automatically generated set of bindings: https://github.com/duncantl/RCodeGen Best, Clark Fitzgerald