subject:"Problem when saving large files via pyarrow hdfs"

Re: Problem when saving large files via pyarrow hdfs

2018-04-23 Thread Wes McKinney

hi Jan, The issue is that the hdfsWrite API uses int32_t (aka "tSize") for write sizes: https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs-internal.h#L69 So when writing files over INT32_MAX, we must write in chunks. Can you please open a JIRA with your bug report and this informa

Problem when saving large files via pyarrow hdfs

2018-04-19 Thread Jan-Hendrik Zab

Hello! I'm currently trying to use pyarrows hdfs lib from within hadoop streaming, specifically in the reducer with python 3.6 (anaconda). But the mentioned problem occurs either way. pyarrow version is 0.9.0 I'm starting the actual python script via a wrapper sh script that sets the LD_LIBRARY_