Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-08-05 Thread Radu Teodorescu
> I will have a closer look and comment most likely next week. Thank you! > > Unfortunately, having code developed in external repositories increases the > complexity of importing that code back into the Apache project Not sure if > you’re interested in preemptively following the project’s st

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-08-05 Thread Wes McKinney
I will have a closer look and comment most likely next week. Unfortunately, having code developed in external repositories increases the complexity of importing that code back into the Apache project Not sure if you’re interested in preemptively following the project’s style guide (file naming, C

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-08-05 Thread Radu Teodorescu
Wes & crew, Congratulations and thank you for the successful 1.0 rollout , it is certainly making a huge difference for my day job! Is it a good time now to revive the conversation below? (and https://github.com/apache/arrow/pull/7548 ) I have also gone ahead and released a prototype the covers

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-06-25 Thread Radu Teodorescu
Understood and agreed My proposal really addresses a number of mechanisms on layer 2 ( "Virtual" tables) in your taxonomy (I can adjust interface names accordingly as part of the review process). One additional element I am proposing here is the ability to insert and modify rows in a vectorized

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-06-25 Thread Wes McKinney
hi Radu, It's going to be challenging for me to review in detail until after the 1.0.0 release is out, but in general I think there are 3 layers that we need to be talking about: * Materialized in-memory tables * "Virtual" tables, whose in-memory/not-in-memory semantics are not exposed -- permitt

Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-06-25 Thread Radu Teodorescu
Here it is as a pull request: https://github.com/apache/arrow/pull/7548 I hope this can be a starter for an active conversation diving into specifics, and I look forward to contribute with more design and algorithm ideas as well as concrete code. > O

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Neal Richardson
Maybe a draft pull request? If you put "WIP" in the pull request title, CI won't run builds on it, so it's suitable for rough outlines and collecting feedback. Neal On Wed, Jun 17, 2020 at 2:57 PM Radu Teodorescu wrote: > Thank you Wes! > Yes, both proposals fit very nicely in your Data Frames

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
Thank you Wes! Yes, both proposals fit very nicely in your Data Frames vision, I see them as deep dives on some specifics: - the virtual array doc is more fluffy an probably if you agree with the general concept, the next logical move is to put out some interfaces indeed - the random access doc g

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Wes McKinney
hi Radu, I'll read the proposals in more detail when I can and make comments, but this has always been something of interest (see, e.g. [1]). The intent with the "C++ data frames" project that we've discussed (and I continue to labor towards, e.g. recent compute engine work is directly in service

Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
Hi folks, While I’ve been communicating with some members of this group in the past, this is my first official post so please excuse/correct/guide me as needed. Logistics first: I put most of the content of my proposals in google doc, but if more appropriate, we can keep the conversation going b