Hi Liya Fan, Perhaps comment on the original thread? This differs from my proposal in terms on details of encoding. For RLE, I proposed encoding run end indices instead of run-lengths. This allows for sublinear access to elements at the cost of potentially larger bit-widths for the lengths.
Thanks, Micah On Wed, Aug 21, 2019 at 6:50 PM Fan Liya <liya.fa...@gmail.com> wrote: > Hi Wes, > > Thanks for the good suggestion. > It is intended to be sent through IPC. So it should implement FieldVector, > not just ValueVector. > > This can be considered a sub-item of Micah's proposal about > compression/decompression. > I will spend more time on that discussion. > > Best, > Liya Fan > > On Wed, Aug 21, 2019 at 9:34 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Liya, > > > > Do you intend to be able to send RLE vectors using the IPC protocol? > > If so, we need to spend some time on Micah's discussion about > > sparseness and encodings/compression. > > > > - Wes > > > > On Wed, Aug 21, 2019 at 7:33 AM Fan Liya <liya.fa...@gmail.com> wrote: > > > > > > Dear all, > > > > > > RLE (run length encoding) is a widely used encoding/decoding technique. > > > Compared with other encoding/decoding techniques, it is easier to work > > with > > > the encoded data. > > > > > > We want to provide an RLE vector implementation in Arrow. The design > > > details include: > > > > > > 1. RleVector implements ValueVector. > > > 2. the data structure of RleVector includes an inner vector, plus a > > > repetition buffer. > > > 3. we do not provide random access over the RleVector > > > 4. In the future, we will provide iterators to access the vector in > > > sequence. > > > 5. RleVector does not support update, but supports appending. > > > 6. In the future, we will provide encoder/decoder to efficiently > > transform > > > encoded/decoded vectors. > > > > > > Please give your valuable feedback. > > > > > > Best, > > > Liya Fan > > >