Re: [DISC] Improving Arrow's database support

2022-09-14 Thread David Li
I put up [1] as the PR to apache/arrow to vote on. There is a bit of a circular dependency here: my thought is that we will vote on this, then tag the 1.0.0 API standard on apache/arrow-adbc, and finally update the PR before merging. But actual releases of the packages may be a later commit/tag

Re: Integration between Flight and Acero

2022-09-14 Thread Li Jin
Thanks both for the suggestions, it makes sense. I will try with SourceNode with the factory method first because my service/client API doesn't support parallel read yet. (Parallel reading while preserving data ordering via flight protocol is something I thought about a little bit but probably som

Re: PRs for RLE support

2022-09-14 Thread Weston Pace
I'm going to bump this because it would be good to get feedback. In particular it would be nice to get feedback on the suggested format change[1]. We are currently moving forward on coming up with an IPC format proposal which we will share when ready. The two interesting points that jump out to

Re: PRs for RLE support

2022-09-14 Thread Micah Kornfield
> > * Should we encode "run lengths" or "run ends"? I think the project has leaned towards sublinear access, so run ends make sense. The downside is that we run into similar issues with List/LargeList where the total number of elements is limited by bit-width (which can also cause space wastage

Re: PRs for RLE support

2022-09-14 Thread Matthew Topol
Just wanted to chime in here that I also have several draft PRs for implementing the RLE arrays in Go as the second implementation (since we use two implementations as a requirement to vote on changes/additions to the format). They can be found here:

Re: PRs for RLE support

2022-09-14 Thread Dewey Dunnington
> * Should we encode "run lengths" or "run ends"? In addition to the points mentioned above, this seems the most consistent with the variable-length binary/list layouts > encoding the run ends as a buffer (similar to list array for example) makes it difficult to calculate offsets I don't have a

I need C++ tutoring

2022-09-14 Thread Mauricio Vargas SepĂșlveda
Hi! I'm looking for an instructor who can explain me how to correctly write a C++ function. I know some Rcpp and I need to translate a function to cpp11 (R). I have some funds to pay for tutoring. What I'm experiencing is described here: https://github.com/pachadotdev/fixest/issues/18#issu

Re: PRs for RLE support

2022-09-14 Thread Weston Pace
I will clarify the offset problem. It essentially boils down to "if you don't have constant access to elements then an array length offset does not give you constant access to buffer offsets". We start with an RLE array of length 200. We slice it with (start=10, length=100) to get an RLE array o

Re: Arrow sync call September 14 at 12:00 US/Eastern, 16:00 UTC

2022-09-14 Thread Will Jones
Attendees: - Will Jones - Ian Joiner - Jacob Wujciak - Dhamo - Dewey Dunnington - Sean Gallagher - Ashish Paliwal - Rok Mihevc - James Duong - Bryce Mecum - Anja Boskovic - Matt Topol - David Li Discussion: RLE Progress - Dewey considering implementing

Re: PRs for RLE support

2022-09-14 Thread Dewey Dunnington
Thanks for the clarification! The (probably very common) slicing case makes a lot of sense. On Wed, Sep 14, 2022 at 3:19 PM Weston Pace wrote: > I will clarify the offset problem. It essentially boils down to "if > you don't have constant access to elements then an array length offset > does no

Re: PRs for RLE support

2022-09-14 Thread Matthew Topol
> On the other hand, if there were two child arrays then an implementation, when slicing, could choose to always keep the offset of the parent array at 0 and instead put the offsets in the child arrays. Now you have a parent array with offset 0, a run ends (int32) array with offset 74 and length

Re: [Flight][Java][JDBC] IP clearance of Flight JDBC Driver

2022-09-14 Thread David Li
It's been a long time coming, but the Fight SQL JDBC driver is now merged. There are many improvements to make [1] but I think it'll be easier to do those as small PRs against the main branch rather than keep this mega branch alive. And it will hopefully unblock other contributors who are trying

Re: [Flight][Java][JDBC] IP clearance of Flight JDBC Driver

2022-09-14 Thread Ray Lum
Wonderful. Thank you, David, for your amazing efforts to get this merged! On Wed, Sep 14, 2022 at 11:48 AM David Li wrote: > It's been a long time coming, but the Fight SQL JDBC driver is now merged. > There are many improvements to make [1] but I think it'll be easier to do > those as small PRs

Re: PRs for RLE support

2022-09-14 Thread Tobias Zagorni
Am Mittwoch, dem 14.09.2022 um 14:33 -0400 schrieb Matthew Topol: > Doesn't this explanation conflate the Logical Offset (the parent's > offset) and the Physical Offset (the offset of the run ends / values > children)? Or am I missing something? If you have an RLE array > of length 200, and you s

Re: PRs for RLE support

2022-09-14 Thread Weston Pace
> The downside is that we run into similar issues with List/LargeList > where the total number of elements is limited by bit-width (which can also > cause space wastage, e.g. with run ends it might be reasonable to limit > bit-width to 16). True. I guess another question would be whether we want