Hey Alessandro, take a look at the top level docs on ValueVector:

https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ValueVector.html

Specifically the following:

   - values need to be written in order (e.g. index 0, 1, 2, 5)
   - null vectors start with all values as null before writing anything
   - for variable width types, the offset vector should be all zeros before
   writing
   - you must call setValueCount before a vector can be read
   - you should never write to a vector once it has been read.


In a perfect world we would have done a better job in the object
hierarchy/behavior of making this explicit but we don't live in that world,
unfortunately. I'll also say that these rules are actually more stringent
than what is technically safe. For example, in one project we would use a
BigIntVector to maintain and update sums when doing hash aggregations
(which includes a read-modify-write on individual cells out of order). That
being said, that's advanced usage and most people should stick with the
guidelines above.

On Wed, Nov 3, 2021 at 5:50 AM Alessandro Molina <
alessan...@ursacomputing.com> wrote:

> I recently noticed that in the Java implementation we expose a set/setSafe
> function that allows to mutate Arrow Arrays [1]
>
> This seems to be at odds with the general design of the C++ (and by
> consequence Python and R) library where Arrays are immutable and can be
> modified only through compute functions returning copies.
>
> The Arrow Format documentation [2] seems to suggest that mutation of data
> structures is possible and left as an implementation detail, but given that
> some users might be willing to mutate existing structures (for example to
> avoid incurring in the memory cost of copies when dealing with big arrays)
> I think there might be reasons for both allowing mutation of Arrays and
> disallowing it. It probably makes sense to ensure that all the
> implementations agree on such a fundamental choice to avoid setting
> expectations on users' side that might not apply when they cross language
> barriers.
>
> [1]
>
> https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/SmallIntVector.html#setSafe-int-int-
> [2] https://arrow.apache.org/docs/format/Columnar.html
>

Reply via email to