Hi Micah,

Thanks for the comments.
By storing the run-length ends (partial sum of run-lengths), it provides
better support for random access (O(log(n)), at the expense of larger
buffer width.

Generally, I think this is a better design, so the design should be changed
as follows:

2. the data structure of RleVector includes an inner vector, plus a buffer
storing the end indices for runs.
3. we provide random access, with time complexity O(log(n)), so it should
not be used frequently.

What do you think?

Best,
Liya Fan

On Thu, Aug 22, 2019 at 11:45 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Liya Fan,
> Perhaps comment on the original thread?  This differs from my proposal in
> terms on details of encoding.  For RLE, I proposed encoding run end indices
> instead of run-lengths.  This allows for sublinear access to elements at
> the cost of potentially larger bit-widths for the lengths.
>
>
> Thanks,
> Micah
>
> On Wed, Aug 21, 2019 at 6:50 PM Fan Liya <liya.fa...@gmail.com> wrote:
>
> > Hi Wes,
> >
> > Thanks for the good suggestion.
> > It is intended to be sent through IPC. So it should implement
> FieldVector,
> > not just ValueVector.
> >
> > This can be considered a sub-item of Micah's proposal about
> > compression/decompression.
> > I will spend more time on that discussion.
> >
> > Best,
> > Liya Fan
> >
> > On Wed, Aug 21, 2019 at 9:34 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > hi Liya,
> > >
> > > Do you intend to be able to send RLE vectors using the IPC protocol?
> > > If so, we need to spend some time on Micah's discussion about
> > > sparseness and encodings/compression.
> > >
> > > - Wes
> > >
> > > On Wed, Aug 21, 2019 at 7:33 AM Fan Liya <liya.fa...@gmail.com> wrote:
> > > >
> > > > Dear all,
> > > >
> > > > RLE (run length encoding) is a widely used encoding/decoding
> technique.
> > > > Compared with other encoding/decoding techniques, it is easier to
> work
> > > with
> > > > the encoded data.
> > > >
> > > > We want to provide an RLE vector implementation in Arrow. The design
> > > > details include:
> > > >
> > > > 1. RleVector implements ValueVector.
> > > > 2. the data structure of RleVector includes an inner vector, plus a
> > > > repetition buffer.
> > > > 3. we do not provide random access over the RleVector
> > > > 4. In the future, we will provide iterators to access the vector in
> > > > sequence.
> > > > 5. RleVector does not support update, but supports appending.
> > > > 6. In the future, we will provide encoder/decoder to efficiently
> > > transform
> > > > encoded/decoded vectors.
> > > >
> > > > Please give your valuable feedback.
> > > >
> > > > Best,
> > > > Liya Fan
> > >
> >
>

Reply via email to