Felipe, thank you for bringing this up. Another approach that is sometimes used in database engines (like DuckDB) and is often called selection vectors, is to store another bitmask that says which elements in the array should be "selected" and which are ignored and functions like a view.
For example, a selection vector {0, 1, 1, 0, 1} would represent a view of the second and third and fifth rows I think the selection vector is as general as the ArrayVector format you describe, and likely simpler to implement (especially in compute kernels). The downside is that for very sparse selections on very large arrays, the size of the selection vector may be larger than the array view Have you considered such an approach? Andrew On Wed, Apr 26, 2023 at 1:27 AM wish maple <maplewish...@gmail.com> wrote: > I think the ArrayVector can have benefits above: > 1. Converting a Batch in Velox or other system to arrow array could be much > more lightweight. > 2. Modifying, filter and copy array or string could be much more > lightweight > > Velox can make a Vector mutable, seems that arrow array cannot. Seems it > makes little difference here. > > On 2023/04/25 22:00:08 Felipe Oliveira Carvalho wrote: > > Hi folks, > > > > I would like to start a public discussion on the inclusion of a new array > > format to Arrow — array-view array. The name is also up for debate. > > > > This format is inspired by Velox's ArrayVector format [1]. Logically, > this > > array represents an array of arrays. Each element is an array-view > (offset > > and size pair) that points to a range within a nested "values" array > > (called "elements" in Velox docs). The nested array can be of any type, > > which makes this format very flexible and powerful. > > > > [image: ../_images/array-vector.png] > > <https://facebookincubator.github.io/velox/_images/array-vector.png> > > > > I'm currently working on a C++ implementation and plan to work on a Go > > implementation to fulfill the two-implementations requirement for format > > changes. > > > > The draft design: > > > > - 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer] > > - 1 child array: "values" as an array of the type parameter > > > > validity_bitmap is used to differentiate between empty array views > > (sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0). > > > > When the validity_bitmap[i] is 0, both sizes and offsets are undefined > (as > > usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is recommended > > if setting a value is not an issue to the system producing the arrays. > > > > offsets buffer is not required to be ordered and views don't have to be > > disjoint. > > > > [1] > > > https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector > > > > Thanks, > > Felipe O. Carvalho > > >