Hey Alessandro, take a look at the top level docs on ValueVector: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ValueVector.html
Specifically the following: - values need to be written in order (e.g. index 0, 1, 2, 5) - null vectors start with all values as null before writing anything - for variable width types, the offset vector should be all zeros before writing - you must call setValueCount before a vector can be read - you should never write to a vector once it has been read. In a perfect world we would have done a better job in the object hierarchy/behavior of making this explicit but we don't live in that world, unfortunately. I'll also say that these rules are actually more stringent than what is technically safe. For example, in one project we would use a BigIntVector to maintain and update sums when doing hash aggregations (which includes a read-modify-write on individual cells out of order). That being said, that's advanced usage and most people should stick with the guidelines above. On Wed, Nov 3, 2021 at 5:50 AM Alessandro Molina < alessan...@ursacomputing.com> wrote: > I recently noticed that in the Java implementation we expose a set/setSafe > function that allows to mutate Arrow Arrays [1] > > This seems to be at odds with the general design of the C++ (and by > consequence Python and R) library where Arrays are immutable and can be > modified only through compute functions returning copies. > > The Arrow Format documentation [2] seems to suggest that mutation of data > structures is possible and left as an implementation detail, but given that > some users might be willing to mutate existing structures (for example to > avoid incurring in the memory cost of copies when dealing with big arrays) > I think there might be reasons for both allowing mutation of Arrays and > disallowing it. It probably makes sense to ensure that all the > implementations agree on such a fundamental choice to avoid setting > expectations on users' side that might not apply when they cross language > barriers. > > [1] > > https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/SmallIntVector.html#setSafe-int-int- > [2] https://arrow.apache.org/docs/format/Columnar.html >