Hi, As I pointed out in my previous email, the C++ code has an optimization for the cases where (i) there are no null values; (ii) or all values are null. Java code path does not have it. I am trying to implement this feature. It would look something like:
public int isSet(int index) { if(nullCount == valueCount) return 0; else if (nullCount == 0) return 1; else { final int byteIndex = index >> 3; final byte b = validityBuffer.getByte(byteIndex); final int bitIndex = index & 7; return (b >> bitIndex) & 0x01; } } The current problem is that "nullCount" is not explicitly tracked in the Java code. It is checked by calling public int getNullCount() { return BitVectorHelper.getNullCount(validityBuffer, valueCount); } which is not very optimal, and cannot be called everytime in isSet(). I see in the source code there is a TODO about this https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L75 which says: "Right now BaseValueVector is the top level base class for other vector types in ValueVector hierarchy (non-nullable) and those vectors have not yet been refactored/removed so moving things to the top class as of now is not a good idea." (1) I am not sure what this means? can someone explain? Why is not a good idea? (2) I think there is another branch of AbstractContainerVector which does not share BaseValueVector class as the top-level base class. AbstractContainerVector implements ValueVector (which is an interface). In the C++ code, data and bitmap are both stored in the top-level Array class, which probably is not possible in the Java implementation. However we can move the bitmap operations to the "BaseValueVector" class. I don't know what to do about the AbstractContainerVector path. Perhaps some code needs to be duplicated there. (3) Is this the right design choice? Any inputs? Thanks, -- Animesh