Comments are on now, sorry about that. On Tue, May 21, 2019, 1:06 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
> Hi Wes, > It looks like comments are turned off on the doc, this intentional? > > Thanks, > Micah > > On Mon, May 20, 2019 at 3:49 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > I'm interested in starting to build a so-called "data frame" interface > > as a moderately opinionated, higher-level usability layer for > > interacting with Arrow-based chunked in-memory data. I've had numerous > > discussions (mostly in-person) over the last few years about this and > > it feels to me that if we don't build something like this in Apache > > Arrow that we could end up with several third party efforts without > > much community discussion or collaboration, which would be sad. > > > > Another anti-pattern that is occurring is that users are loading data > > into Arrow, converting to a library like pandas in order to do some > > simple in-memory data manipulations, then converting back to Arrow. > > This is not the intended long term mode of operation. > > > > I wrote in significantly more detail (~7-8 pages) about the context > > and motivation for this project: > > > > > > > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit?usp=sharing > > > > Note that this would be a parallel effort to go alongside the > > previously-discussed "Query Engine" project, and the two things are > > intended to work together. Since we are creating computational > > kernels, this would also provide some immediacy in being able to > > invoke kernels easily on large in-memory datasets without having to > > wait for a more full-fledged query engine system to be developed > > > > The details with these kinds of projects can be bedeviling so my > > approach would be to begin to lay down the core abstractions and basic > > APIs and use the project to drive the agenda for kernel development > > (which can also be used in the context of a query engine runtime). > > From my past experience designing pandas and some other in-memory > > analytics projects, I have some idea of the kinds of mistakes or > > design patterns I would like to _avoid_ in this effort, but others may > > have some experiences they can offer to inform the design approach as > > well. > > > > Looking forward to comments and discussion. > > > > - Wes > > >