ValueCount include both null and not null values. Perhaps a better name for the method would have been setSize or setLength.
On Thursday, November 14, 2019, azim afroozeh <afrooz...@gmail.com> wrote: > Thanks for your answer. I have one more question. In this test function for > example ( > https://github.com/apache/arrow/blob/master/java/vector/ > src/test/java/org/apache/arrow/vector/TestValueVector.java#L1524) > : > > there is a for loop which tries to fill in some values but not all values. > It leaves some of them as null. > > for (int i = 0; i < capacity; i++) { > if (i % 3 == 0) { > continue; > } > byte[] b = Integer.toString(i).getBytes(); > vector.setSafe(i, b, 0, b.length); > } > Then there is setValueCount function which set the valueCount. > vector.setValueCount(capacity); > > I think by setting the valueCount to Capacity it means that all values are > filled in and there is not any null values in the valueVector. But Later in > the following loop, it checks whether the unset values are null which they > should not be null because ValueCount is equal to Capacity (All values are > set). > for (int i = 0; i < capacity; i++) { > if (i % 3 == 0) { > assertNull(vector.getObject(i)); > } else { > assertEquals("unexpected value at index: " + i, > Integer.toString(i), vector.getObject(i).toString()); > } > } > > Am I missing something here? > > Thanks > > Azim > > On Thu, Nov 14, 2019 at 11:56 AM Fan Liya <liya.fa...@gmail.com> wrote: > > > Hi Azim, > > > > According to the current API, after filling in some values, you have to > set > > the value count manually (through the setValueCount method). > > Otherwise, the value count remains 0. > > > > Best, > > Liya Fan > > > > > > On Thu, Nov 14, 2019 at 6:33 PM azim afroozeh <afrooz...@gmail.com> > wrote: > > > > > Thanks for your answer. So the valueCount shows the number of data > filled > > > in the vector. > > > > > > Then I would like to ask you why the valueCount after setting some > values > > > is 0? for example: ( > > > > > > > > https://github.com/apache/arrow/blob/3fbbcdaf77a9e354b6bd07ec1fd1da > c005a505c9/java/vector/src/test/java/org/apache/arrow/ > vector/TestValueVector.java#L609 > > > ) > > > > > > > > > System.out.print(vector.getValueCount()); //prints 0 > > > /* populate the vector */vector.set(0, 100.5f);vector.set(2, > > > 201.5f);vector.set(4, 300.3f);vector.set(6, 423.8f);vector.set(8, > > > 555.6f);vector.set(10, 66.6f);vector.set(12, 78.8f);vector.set(14, > > > 89.5f); > > > System.out.print(vector.getValueCount()); //prints 0 > > > > > > > > > If I add these two print lines, they will print 0. > > > > > > > > > Also If I add the following code to isSet again some tests fail. > > > > > > if (valueCount == getValueCapacity()) { return 1; } > > > > > > > > > > > > Thanks, > > > > > > > > > Azim Afroozeh > > > > > > On Fri, Nov 8, 2019 at 10:57 AM Fan Liya <liya.fa...@gmail.com> wrote: > > > > > > > Hi Azim, > > > > > > > > I think we should be aware of two distinct concepts: > > > > > > > > 1. vector capacity: the max number of values that can be stored in > the > > > > vector, without reallocation > > > > 2. vector length: the number of values actually filled in the vector > > > > > > > > For any valid vector, we always have vector length <= vector > capacity. > > > > > > > > The allocateNew method expands the vector capacity, but it does not > > fill > > > in > > > > any value, so it does not affect the the vector length. > > > > > > > > For the code above, if the vector length is 0, the value of > > isSet(index) > > > > (where index > 0) should be undefined. So throwing an exception is > the > > > > correct behavior. > > > > > > > > Hope this answers your question. > > > > > > > > Best, > > > > Liya Fan > > > > > > > > > > > > On Fri, Nov 8, 2019 at 5:38 PM azim afroozeh <afrooz...@gmail.com> > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > I have a question about the Java implementation of Apache Arrow. > > Should > > > > we > > > > > always call setValueCount after creating a vector with > allocateNew()? > > > > > > > > > > I can see that in some tests where setValueCount is called > > immediately > > > > > after allocateNew. For example here: > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/master/java/vector/ > src/test/java/org/apache/arrow/vector/TestValueVector.java#L285 > > > > > , > > > > > but not in other tests: > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/master/java/vector/ > src/test/java/org/apache/arrow/vector/TestValueVector.java#L792 > > > > > . > > > > > > > > > > To illustrate the problem more, if I change the isSet(int > > > index)function > > > > as > > > > > follows: > > > > > > > > > > public int isSet(int index) { > > > > > if (valueCount == 0) { > > > > > return 0; > > > > > } > > > > > final int byteIndex = index >> 3; > > > > > final byte b = validityBuffer.getByte(byteIndex); > > > > > final int bitIndex = index & 7; > > > > > return (b >> bitIndex) & 0x01; > > > > > } > > > > > > > > > > Many tests will fail, while logically they should not because if > the > > > > > valueCount is 0 then isSet returned value for every index should be > > > zero. > > > > > The problem comes from the allocateNew method which does not > > initialize > > > > the > > > > > valueCount variable. > > > > > > > > > > One potential solution to this problem is to initialize the > > valueCount > > > > > in allocateNew function, as I did here: > > > > > > > > > > > > > > > > > > > https://github.com/azimafroozeh/arrow/commit/ > 4281613b7ed1370252a155192f12b9bca494dbeb > > > > > . > > > > > The classes BaseVariableWidthVector and BaseFixedWidthVector, both > > have > > > > > allocateNew function that needs to be changed. Is this an > acceptable > > > > > approach? or am I missing some semantics? > > > > > > > > > > Thanks, > > > > > > > > > > Azim Afroozeh > > > > > > > > > > > > > > >