Re: Problem when saving large files via pyarrow hdfs

2018-04-23 Thread Wes McKinney
hi Jan, The issue is that the hdfsWrite API uses int32_t (aka "tSize") for write sizes: https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs-internal.h#L69 So when writing files over INT32_MAX, we must write in chunks. Can you please open a JIRA with your bug report and this informa

Problem when saving large files via pyarrow hdfs

2018-04-19 Thread Jan-Hendrik Zab
Hello! I'm currently trying to use pyarrows hdfs lib from within hadoop streaming, specifically in the reducer with python 3.6 (anaconda). But the mentioned problem occurs either way. pyarrow version is 0.9.0 I'm starting the actual python script via a wrapper sh script that sets the LD_LIBRARY_