hi Jan,
The issue is that the hdfsWrite API uses int32_t (aka "tSize") for write sizes:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs-internal.h#L69
So when writing files over INT32_MAX, we must write in chunks. Can you
please open a JIRA with your bug report and this informa
Hello!
I'm currently trying to use pyarrows hdfs lib from within hadoop
streaming, specifically in the reducer with python 3.6 (anaconda). But
the mentioned problem occurs either way. pyarrow version is 0.9.0
I'm starting the actual python script via a wrapper sh script that sets
the LD_LIBRARY_