Hi Azim, I think we should be aware of two distinct concepts:
1. vector capacity: the max number of values that can be stored in the vector, without reallocation 2. vector length: the number of values actually filled in the vector For any valid vector, we always have vector length <= vector capacity. The allocateNew method expands the vector capacity, but it does not fill in any value, so it does not affect the the vector length. For the code above, if the vector length is 0, the value of isSet(index) (where index > 0) should be undefined. So throwing an exception is the correct behavior. Hope this answers your question. Best, Liya Fan On Fri, Nov 8, 2019 at 5:38 PM azim afroozeh <afrooz...@gmail.com> wrote: > Hi everyone, > > I have a question about the Java implementation of Apache Arrow. Should we > always call setValueCount after creating a vector with allocateNew()? > > I can see that in some tests where setValueCount is called immediately > after allocateNew. For example here: > > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L285 > , > but not in other tests: > > https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L792 > . > > To illustrate the problem more, if I change the isSet(int index)function as > follows: > > public int isSet(int index) { > if (valueCount == 0) { > return 0; > } > final int byteIndex = index >> 3; > final byte b = validityBuffer.getByte(byteIndex); > final int bitIndex = index & 7; > return (b >> bitIndex) & 0x01; > } > > Many tests will fail, while logically they should not because if the > valueCount is 0 then isSet returned value for every index should be zero. > The problem comes from the allocateNew method which does not initialize the > valueCount variable. > > One potential solution to this problem is to initialize the valueCount > in allocateNew function, as I did here: > > https://github.com/azimafroozeh/arrow/commit/4281613b7ed1370252a155192f12b9bca494dbeb > . > The classes BaseVariableWidthVector and BaseFixedWidthVector, both have > allocateNew function that needs to be changed. Is this an acceptable > approach? or am I missing some semantics? > > Thanks, > > Azim Afroozeh >