Hi all,

I am looking for a quick way to look up the total row count of a data set 
stored in Arrow’s random access file format using the Java API. Basically, a 
quicker way to do this:

// The reader is in an instance of ArrowFileReader
List<ArrowBlock> blocks = reader.getRecordBlocks();
int nRows = 0;
for (ArrowBlock block : blocks) {
    reader.loadRecordBatch(block);
    nRows += root.getRowCount();
}

My understanding is that the above snippets loads the entire data set instead 
of just the block headers.

To give you some context, I am looking into using Arrow for IPC between a JVM 
and a Python interpreter using a custom data format and PyArrow/Pandas 
respectively. While the streaming API might be a better tool for this job, I 
started out with using files to keep things simple.

Any help would be greatly appreciated – maybe I just missed the right bit of 
documentation.

Thanks,
Michael

Reply via email to