I'm in favor of this, but still think we are gather feedback on the
proposal, so we should hold off on coding these up, until we have consensus
on the approach.

Thanks,
Micah

On Wed, Aug 21, 2019 at 9:22 PM Fan Liya <liya.fa...@gmail.com> wrote:

> Hi Micah,
>
> Thanks for the comments.
> By storing the run-length ends (partial sum of run-lengths), it provides
> better support for random access (O(log(n)), at the expense of larger
> buffer width.
>
> Generally, I think this is a better design, so the design should be
> changed as follows:
>
> 2. the data structure of RleVector includes an inner vector, plus a buffer
> storing the end indices for runs.
> 3. we provide random access, with time complexity O(log(n)), so it should
> not be used frequently.
>
> What do you think?
>
> Best,
> Liya Fan
>
> On Thu, Aug 22, 2019 at 11:45 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hi Liya Fan,
>> Perhaps comment on the original thread?  This differs from my proposal in
>> terms on details of encoding.  For RLE, I proposed encoding run end
>> indices
>> instead of run-lengths.  This allows for sublinear access to elements at
>> the cost of potentially larger bit-widths for the lengths.
>>
>>
>> Thanks,
>> Micah
>>
>> On Wed, Aug 21, 2019 at 6:50 PM Fan Liya <liya.fa...@gmail.com> wrote:
>>
>> > Hi Wes,
>> >
>> > Thanks for the good suggestion.
>> > It is intended to be sent through IPC. So it should implement
>> FieldVector,
>> > not just ValueVector.
>> >
>> > This can be considered a sub-item of Micah's proposal about
>> > compression/decompression.
>> > I will spend more time on that discussion.
>> >
>> > Best,
>> > Liya Fan
>> >
>> > On Wed, Aug 21, 2019 at 9:34 PM Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >
>> > > hi Liya,
>> > >
>> > > Do you intend to be able to send RLE vectors using the IPC protocol?
>> > > If so, we need to spend some time on Micah's discussion about
>> > > sparseness and encodings/compression.
>> > >
>> > > - Wes
>> > >
>> > > On Wed, Aug 21, 2019 at 7:33 AM Fan Liya <liya.fa...@gmail.com>
>> wrote:
>> > > >
>> > > > Dear all,
>> > > >
>> > > > RLE (run length encoding) is a widely used encoding/decoding
>> technique.
>> > > > Compared with other encoding/decoding techniques, it is easier to
>> work
>> > > with
>> > > > the encoded data.
>> > > >
>> > > > We want to provide an RLE vector implementation in Arrow. The design
>> > > > details include:
>> > > >
>> > > > 1. RleVector implements ValueVector.
>> > > > 2. the data structure of RleVector includes an inner vector, plus a
>> > > > repetition buffer.
>> > > > 3. we do not provide random access over the RleVector
>> > > > 4. In the future, we will provide iterators to access the vector in
>> > > > sequence.
>> > > > 5. RleVector does not support update, but supports appending.
>> > > > 6. In the future, we will provide encoder/decoder to efficiently
>> > > transform
>> > > > encoded/decoded vectors.
>> > > >
>> > > > Please give your valuable feedback.
>> > > >
>> > > > Best,
>> > > > Liya Fan
>> > >
>> >
>>
>

Reply via email to