I'm not at all opposed to renaming it as `Run-End-Encoding` if that would be preferable. Hopefully others will chime in with their feedback.
--Matt On Wed, Dec 14, 2022 at 12:09 PM Ian Cook <i...@ursacomputing.com> wrote: > Thank you Matt, Tobias, and others for the great work on this. > > I am -0.5 on this proposal in its current form because (pardon the > pedantry) what we have implemented here is not run-length encoding; it > is run-end encoding. Based on community input, the choice was made to > store run ends instead of run lengths because this enables O(log(N)) > random access as opposed to O(N). This is a sensible choice, but it > comes with some trade-offs including limitations in array length > (which maybe not really a problem in practice) and lack of bit-for-bit > equivalence with RLE encodings that use run lengths like Velox's > SequenceVector encoding (which I think is a more serious problem in > practice). > > I believe that we should either: > (a) rename this to "run-end encoding" > (b) change this to a parameterized type called "run encoding" that > takes a Boolean parameter specifying whether run lengths or run ends > are stored. > > Ian > > On Wed, Dec 14, 2022 at 11:27 AM Matt Topol <zotthewiz...@gmail.com> > wrote: > > > > Hello, > > > > I'd like to propose adding the RLE type based on earlier > discussions[1][2] > > to the Arrow format: > > - Columnar Format description: > > > https://github.com/apache/arrow/pull/13333/files#diff-8b68cf6859e881f2357f5df64bb073135d7ff6eeb51f116418660b3856564c60 > > - Flatbuffers changes: > > > https://github.com/apache/arrow/pull/14176/files#diff-e54b4f5d2d279acc5d1df5df9a7636f0142a8041fe02f07034e0d8be48444b07 > > > > There is a proposed implementation available in both C++ (written by > Tobias > > Zagorni) and Go[3][4]. Both implementations have mostly the same tests > > implemented and were tested to be compatible over IPC with an archery > test. > > In both cases, the implementations are split out among several Draft PRs > so > > that they can be easily reviewed piecemeal if the vote is approved, with > > each Draft PR including the changes of the one before it. The links > > provided are the Draft PRs with the entirety of the changes included. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 add the proposed RLE type to the Apache Arrow format > > [ ] -1 do not add the proposed RLE type to the Apache Arrow format > > because... > > > > Thanks much, and please let me know if any more information or links are > > needed (I've never proposed a vote before on here!) > > > > --Matt > > > > [1] https://lists.apache.org/thread/bfz3m5nyf7flq7n6q9b1bx3jhcn4wq29 > > [2] https://lists.apache.org/thread/xb7c723csrtwt0md3m4p56bt0193n7jq > > [3] https://github.com/apache/arrow/pull/14179 > > [4] https://github.com/apache/arrow/pull/14223 >