For context, there was some discussion on this back in [1].  At that time
this was called "sequence view" but I do not like that name.  However,
array-view array is a little confusing.  Given this is similar to list can
we go with list-view array?

> Thanks for the introduction. I'd be interested to hear about the
> applications Velox has found for these vectors, and in what situations
they
> are useful. This could be contrasted with the current ListArray
> implementations.

I believe one significant benefit is that take (and by proxy, filter) and
sort are O(# of items) with the proposed format and O(# of bytes) with the
current format.  Jorge did some profiling to this effect in [1].

[1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq

On Tue, Apr 25, 2023 at 3:13 PM Will Jones <will.jones...@gmail.com> wrote:

> Hi Felipe,
>
> Thanks for the introduction. I'd be interested to hear about the
> applications Velox has found for these vectors, and in what situations they
> are useful. This could be contrasted with the current ListArray
> implementations.
>
> IIUC it would be fairly cheap to transform a ListArray to an ArrayView, but
> expensive to go the other way.
>
> Best,
>
> Will Jones
>
> On Tue, Apr 25, 2023 at 3:00 PM Felipe Oliveira Carvalho <
> felipe...@gmail.com> wrote:
>
> > Hi folks,
> >
> > I would like to start a public discussion on the inclusion of a new array
> > format to Arrow — array-view array. The name is also up for debate.
> >
> > This format is inspired by Velox's ArrayVector format [1]. Logically,
> this
> > array represents an array of arrays. Each element is an array-view
> (offset
> > and size pair) that points to a range within a nested "values" array
> > (called "elements" in Velox docs). The nested array can be of any type,
> > which makes this format very flexible and powerful.
> >
> > [image: ../_images/array-vector.png]
> > <https://facebookincubator.github.io/velox/_images/array-vector.png>
> >
> > I'm currently working on a C++ implementation and plan to work on a Go
> > implementation to fulfill the two-implementations requirement for format
> > changes.
> >
> > The draft design:
> >
> > - 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer]
> > - 1 child array: "values" as an array of the type parameter
> >
> > validity_bitmap is used to differentiate between empty array views
> > (sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0).
> >
> > When the validity_bitmap[i] is 0, both sizes and offsets are undefined
> (as
> > usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is recommended
> > if setting a value is not an issue to the system producing the arrays.
> >
> > offsets buffer is not required to be ordered and views don't have to be
> > disjoint.
> >
> > [1]
> >
> https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector
> >
> > Thanks,
> > Felipe O. Carvalho
> >
>

Reply via email to