I agree slicing can be tricky here.  Since slicing is not part of the
specification, maybe there should be two separate discussions here.  I'll
be honest, I forget exactly how slicing works in the C++ implementation,
but is

> Say you want to slice the RLE array from Logical Offset 4 (which doesn't
> > fall on a run boundary). How do you represent that with Physical Offsets
> > into Run ends and Values?
>
Do we need to solve this problem, can we keep the logical offset as part of
the  RLE array and slice the Run Buffer and the Value Array at the same
time?

> Say you have the logical values: [5, 5, 5, 6, 6, 7, 7, 7]
>
> Run ends: [3, 5, 8]
> Values: [5, 6, 7]

So a slice at 4 would be:
Run ends: [5, 8]
Values: [6, 7]
This can be done in LOG(N) the physical slice offset for the array is the
same physical slice offset for Run ends (the first element greater than
then the logical offset)

When writing to the IPC format: you subtract the logical offset from the
run ends in the sliced buffer and write that.  Arrays are written as normal:
Run ends: [1, 4]
Values: [6, 7]

Which would reconstruct [6, 7, 7, 7].

For Lookup of elements one could add the logical offset to the index and to
the binary search as normal.

I guess this might be harder to implement based on the current slicing
implementation?  Or I might be missing something obvious?

Cheers,
Micah






On Thu, Sep 15, 2022 at 12:41 AM Antoine Pitrou <anto...@python.org> wrote:

> On Thu, 15 Sep 2022 09:25:53 +0200
> Antoine Pitrou <anto...@python.org> wrote:
> >
> > Why would the run ends and the values have the same offset?
> > Also, how do you interpret the run ends if you have a physical offset
> > into the values array?
> >
> >
> > Say you have the logical values: [5, 5, 5, 6, 6, 7, 7, 7]
> >
> > Run ends: [3, 5, 8]
> > Values: [5, 6, 7]
> >
> > Say you want to slice the RLE array from Logical Offset 4 (which doesn't
> > fall on a run boundary). How do you represent that with Physical Offsets
> > into Run ends and Values?
> >
> > As soon as you set a Physical Offset on the Values, the Run ends don't
> > match anymore.
>
> Hmm, part of my message does not make sense, sorry.
>
> That said, the question about representing a Logical Offset into the
> RLE array purely as Physical Offsets into Run ends and Values still
> holds :-)
>
> Regards
>
> Antoine.
>
>
>

Reply via email to