I think the ArrayVector can have benefits above: 1. Converting a Batch in Velox or other system to arrow array could be much more lightweight. 2. Modifying, filter and copy array or string could be much more lightweight
Velox can make a Vector mutable, seems that arrow array cannot. Seems it makes little difference here. On 2023/04/25 22:00:08 Felipe Oliveira Carvalho wrote: > Hi folks, > > I would like to start a public discussion on the inclusion of a new array > format to Arrow — array-view array. The name is also up for debate. > > This format is inspired by Velox's ArrayVector format [1]. Logically, this > array represents an array of arrays. Each element is an array-view (offset > and size pair) that points to a range within a nested "values" array > (called "elements" in Velox docs). The nested array can be of any type, > which makes this format very flexible and powerful. > > [image: ../_images/array-vector.png] > <https://facebookincubator.github.io/velox/_images/array-vector.png> > > I'm currently working on a C++ implementation and plan to work on a Go > implementation to fulfill the two-implementations requirement for format > changes. > > The draft design: > > - 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer] > - 1 child array: "values" as an array of the type parameter > > validity_bitmap is used to differentiate between empty array views > (sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0). > > When the validity_bitmap[i] is 0, both sizes and offsets are undefined (as > usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is recommended > if setting a value is not an issue to the system producing the arrays. > > offsets buffer is not required to be ordered and views don't have to be > disjoint. > > [1] > https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector > > Thanks, > Felipe O. Carvalho >