> { > length: 2 > offset: 6 > rle: { > length: 1 // actually physical length > offset: 2 > buffer: [3, 5,8] > } > values: { > length: 1 > offset: 2 > buffer: [5, 6, 7] > } > } > Does this make sense?
I think this is a valid way of doing it. There are 2 problems I see with this variant: - The Slice function would need to be modified to to specially handle RLE. Slicing RLE based on a logical offset will now always be O(log(n)) . Slicing by just setting offset/length of the main array will no longer work. (which may not be that bad performance-wise, since in a lot of cases we can't get around resolving the phyiscal offset at some point). - How would we slice nested types, i.e a struct where some fields are run-length encoded? - We don't touch the rle fields since they are struct fields and these are not touched during slicing -> for anything where RLE is nested there is still only a logical offset. Now performance characteristics are also inconsistent between rle and struct<rle,...> - We change the RLE child's run ends/values arrays. Now we have a struct field that is no longer a valid array on its own, likely breaking a lot of things > Apologies Weston, if this is what you were getting at, but I think it > is > slightly different because you talked about 0 offsets in your > example. > > > > > So a slice at 4 would be: > > > Run ends: [5, 8] > > > Values: [6, 7] > > > How do you interpret that? > > Naively, that means the logical values [6, 6, 6, 6, 6, 7, 7, 7] > > It doesn't seem right... we could find the logical offset by looking into run_ends_array[-1] if run_ends_array.offset is not 0. That said the current code does not support that. > Best, Tobias