> My understanding is that the primary benefit of this ListView layout > over Arrow's existing List layouts [1] is that ListView allows for > buffer alignment [2] without padding, which makes vectorized > processing much more efficient. Is this understanding correct?
Yes. Though proponents of list-view would probably point out that it doesn't prevent you from having contiguous buffers, it simply doesn't require it. > Unless I am missing something, I think the selection use-case > could be equally well served by a dictionary-encoded BinarArray/ListArray, > and would have the benefit of not requiring any modifications to the > existing format or kernels. This is a good point that did not come up in the previous discussion that I can see. > The major additional flexibility of the proposed encoding would be permitting disjoint > or overlapping ranges, are these common enough in practice to represent a meaningful bottleneck? I'm not sure. There was one other use case that was brought up in the original discussion. This was that list view arrays can be constructed in parallel. That is, if you know the output size (e.g. when applying a large scalar function), then you can have different threads fill out different regions of the offsets / lengths buffers. That being said, I don't know for certain if anyone is relying on this behavior. On Wed, Apr 26, 2023 at 7:12 AM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > After Weston's suggestion above, I've renamed files and classes in my WIP > implementation: > > ArrayView -> ListView > > On Wed, Apr 26, 2023 at 11:08 AM Ian Cook <i...@ursacomputing.com> wrote: > > > +1 to what Weston and Joris suggested regarding the name. "ListView" > > seems like the best name to use for this layout in Arrow. > > > > My understanding is that the primary benefit of this ListView layout > > over Arrow's existing List layouts [1] is that ListView allows for > > buffer alignment [2] without padding, which makes vectorized > > processing much more efficient. Is this understanding correct? > > > > [1] > > > https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout > > [2] > > > https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding > > > > Ian > > > > On Wed, Apr 26, 2023 at 5:27 AM Joris Van den Bossche > > <jorisvandenboss...@gmail.com> wrote: > > > > > > On Wed, 26 Apr 2023 at 02:37, Weston Pace <weston.p...@gmail.com> > wrote: > > > > > > > > For context, there was some discussion on this back in [1]. At that > > time > > > > this was called "sequence view" but I do not like that name. > However, > > > > array-view array is a little confusing. Given this is similar to > list > > can > > > > we go with list-view array? > > > > > > Yes, given that this is essentially an alternative representation of a > > > logical "list" array, I would also prefer that we use the term "list" > > > in the name for such a new type. The word "array" has a different > > > meaning in context of our columnar specification. > > >