Hi Micah,

Sounds good. Thanks.

I have prepared some initial code, in the hope that it will make
discussions easier.
Anyway, we can ignore it for now, until we have consensus.

Best,
Liya Fan

On Fri, Aug 23, 2019 at 11:05 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I'm in favor of this, but still think we are gather feedback on the
> proposal, so we should hold off on coding these up, until we have consensus
> on the approach.
>
> Thanks,
> Micah
>
> On Wed, Aug 21, 2019 at 9:22 PM Fan Liya <liya.fa...@gmail.com> wrote:
>
>> Hi Micah,
>>
>> Thanks for the comments.
>> By storing the run-length ends (partial sum of run-lengths), it provides
>> better support for random access (O(log(n)), at the expense of larger
>> buffer width.
>>
>> Generally, I think this is a better design, so the design should be
>> changed as follows:
>>
>> 2. the data structure of RleVector includes an inner vector, plus a
>> buffer storing the end indices for runs.
>> 3. we provide random access, with time complexity O(log(n)), so it should
>> not be used frequently.
>>
>> What do you think?
>>
>> Best,
>> Liya Fan
>>
>> On Thu, Aug 22, 2019 at 11:45 AM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> Hi Liya Fan,
>>> Perhaps comment on the original thread?  This differs from my proposal in
>>> terms on details of encoding.  For RLE, I proposed encoding run end
>>> indices
>>> instead of run-lengths.  This allows for sublinear access to elements at
>>> the cost of potentially larger bit-widths for the lengths.
>>>
>>>
>>> Thanks,
>>> Micah
>>>
>>> On Wed, Aug 21, 2019 at 6:50 PM Fan Liya <liya.fa...@gmail.com> wrote:
>>>
>>> > Hi Wes,
>>> >
>>> > Thanks for the good suggestion.
>>> > It is intended to be sent through IPC. So it should implement
>>> FieldVector,
>>> > not just ValueVector.
>>> >
>>> > This can be considered a sub-item of Micah's proposal about
>>> > compression/decompression.
>>> > I will spend more time on that discussion.
>>> >
>>> > Best,
>>> > Liya Fan
>>> >
>>> > On Wed, Aug 21, 2019 at 9:34 PM Wes McKinney <wesmck...@gmail.com>
>>> wrote:
>>> >
>>> > > hi Liya,
>>> > >
>>> > > Do you intend to be able to send RLE vectors using the IPC protocol?
>>> > > If so, we need to spend some time on Micah's discussion about
>>> > > sparseness and encodings/compression.
>>> > >
>>> > > - Wes
>>> > >
>>> > > On Wed, Aug 21, 2019 at 7:33 AM Fan Liya <liya.fa...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > Dear all,
>>> > > >
>>> > > > RLE (run length encoding) is a widely used encoding/decoding
>>> technique.
>>> > > > Compared with other encoding/decoding techniques, it is easier to
>>> work
>>> > > with
>>> > > > the encoded data.
>>> > > >
>>> > > > We want to provide an RLE vector implementation in Arrow. The
>>> design
>>> > > > details include:
>>> > > >
>>> > > > 1. RleVector implements ValueVector.
>>> > > > 2. the data structure of RleVector includes an inner vector, plus a
>>> > > > repetition buffer.
>>> > > > 3. we do not provide random access over the RleVector
>>> > > > 4. In the future, we will provide iterators to access the vector in
>>> > > > sequence.
>>> > > > 5. RleVector does not support update, but supports appending.
>>> > > > 6. In the future, we will provide encoder/decoder to efficiently
>>> > > transform
>>> > > > encoded/decoded vectors.
>>> > > >
>>> > > > Please give your valuable feedback.
>>> > > >
>>> > > > Best,
>>> > > > Liya Fan
>>> > >
>>> >
>>>
>>

Reply via email to