ValueCount include both null and not null values.  Perhaps a better name
for the method would have been setSize or setLength.

On Thursday, November 14, 2019, azim afroozeh <afrooz...@gmail.com> wrote:

> Thanks for your answer. I have one more question. In this test function for
> example (
> https://github.com/apache/arrow/blob/master/java/vector/
> src/test/java/org/apache/arrow/vector/TestValueVector.java#L1524)
> :
>
> there is a for loop which tries to fill in some values but not all values.
> It leaves some of them as null.
>
>       for (int i = 0; i < capacity; i++) {
>         if (i % 3 == 0) {
>           continue;
>         }
>         byte[] b = Integer.toString(i).getBytes();
>         vector.setSafe(i, b, 0, b.length);
>       }
> Then there is setValueCount function which set the valueCount.
> vector.setValueCount(capacity);
>
> I think by setting the valueCount to Capacity it means that all values are
> filled in and there is not any null values in the valueVector. But Later in
> the following loop, it checks whether the unset values are null which they
> should not be null because ValueCount is equal to Capacity (All values are
> set).
>       for (int i = 0; i < capacity; i++) {
>         if (i % 3 == 0) {
>           assertNull(vector.getObject(i));
>         } else {
>           assertEquals("unexpected value at index: " + i,
> Integer.toString(i), vector.getObject(i).toString());
>         }
>       }
>
> Am I missing something here?
>
> Thanks
>
> Azim
>
> On Thu, Nov 14, 2019 at 11:56 AM Fan Liya <liya.fa...@gmail.com> wrote:
>
> > Hi Azim,
> >
> > According to the current API, after filling in some values, you have to
> set
> > the value count manually (through the setValueCount method).
> > Otherwise, the value count remains 0.
> >
> > Best,
> > Liya Fan
> >
> >
> > On Thu, Nov 14, 2019 at 6:33 PM azim afroozeh <afrooz...@gmail.com>
> wrote:
> >
> > > Thanks for your answer. So the valueCount shows the number of data
> filled
> > > in the vector.
> > >
> > > Then I would like to ask you why the valueCount after setting some
> values
> > > is 0? for example: (
> > >
> > >
> > https://github.com/apache/arrow/blob/3fbbcdaf77a9e354b6bd07ec1fd1da
> c005a505c9/java/vector/src/test/java/org/apache/arrow/
> vector/TestValueVector.java#L609
> > > )
> > >
> > >
> > > System.out.print(vector.getValueCount()); //prints 0
> > > /* populate the vector */vector.set(0, 100.5f);vector.set(2,
> > > 201.5f);vector.set(4, 300.3f);vector.set(6, 423.8f);vector.set(8,
> > > 555.6f);vector.set(10, 66.6f);vector.set(12, 78.8f);vector.set(14,
> > > 89.5f);
> > > System.out.print(vector.getValueCount()); //prints 0
> > >
> > >
> > > If I add these two print lines, they will print 0.
> > >
> > >
> > > Also If I add the following code to isSet again some tests fail.
> > >
> > >  if (valueCount == getValueCapacity()) {      return 1;    }
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Azim Afroozeh
> > >
> > > On Fri, Nov 8, 2019 at 10:57 AM Fan Liya <liya.fa...@gmail.com> wrote:
> > >
> > > > Hi Azim,
> > > >
> > > > I think we should be aware of two distinct concepts:
> > > >
> > > > 1. vector capacity: the max number of values that can be stored in
> the
> > > > vector, without reallocation
> > > > 2. vector length: the number of values actually filled in the vector
> > > >
> > > > For any valid vector, we always have vector length <= vector
> capacity.
> > > >
> > > > The allocateNew method expands the vector capacity, but it does not
> > fill
> > > in
> > > > any value, so it does not affect the the vector length.
> > > >
> > > > For the code above, if the vector length is 0, the value of
> > isSet(index)
> > > > (where index > 0) should be undefined. So throwing an exception is
> the
> > > > correct behavior.
> > > >
> > > > Hope this answers your question.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > >
> > > > On Fri, Nov 8, 2019 at 5:38 PM azim afroozeh <afrooz...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I have a question about the Java implementation of Apache Arrow.
> > Should
> > > > we
> > > > > always call setValueCount after creating a vector with
> allocateNew()?
> > > > >
> > > > > I can see that in some tests where setValueCount is called
> > immediately
> > > > > after allocateNew. For example here:
> > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/arrow/blob/master/java/vector/
> src/test/java/org/apache/arrow/vector/TestValueVector.java#L285
> > > > > ,
> > > > > but not in other tests:
> > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/arrow/blob/master/java/vector/
> src/test/java/org/apache/arrow/vector/TestValueVector.java#L792
> > > > > .
> > > > >
> > > > > To illustrate the problem more, if I change the isSet(int
> > > index)function
> > > > as
> > > > > follows:
> > > > >
> > > > > public int isSet(int index) {
> > > > >  if (valueCount == 0) {
> > > > >  return 0;
> > > > >  }
> > > > >  final int byteIndex = index >> 3;
> > > > >  final byte b = validityBuffer.getByte(byteIndex);
> > > > >  final int bitIndex = index & 7;
> > > > >  return (b >> bitIndex) & 0x01;
> > > > > }
> > > > >
> > > > > Many tests will fail, while logically they should not because if
> the
> > > > > valueCount is 0 then isSet returned value for every index should be
> > > zero.
> > > > > The problem comes from the allocateNew method which does not
> > initialize
> > > > the
> > > > > valueCount variable.
> > > > >
> > > > > One potential solution to this problem is to initialize the
> > valueCount
> > > > > in allocateNew function, as I did here:
> > > > >
> > > > >
> > > >
> > >
> > https://github.com/azimafroozeh/arrow/commit/
> 4281613b7ed1370252a155192f12b9bca494dbeb
> > > > > .
> > > > > The classes BaseVariableWidthVector and BaseFixedWidthVector, both
> > have
> > > > > allocateNew function that needs to be changed. Is this an
> acceptable
> > > > > approach? or am I missing some semantics?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Azim Afroozeh
> > > > >
> > > >
> > >
> >
>

Reply via email to