[jira] [Created] (ARROW-1142) [C++] Move over compression library toolchain from parquet-cpp

2017-06-22 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1142: --- Summary: [C++] Move over compression library toolchain from parquet-cpp Key: ARROW-1142 URL: https://issues.apache.org/jira/browse/ARROW-1142 Project: Apache Arrow

[jira] [Created] (ARROW-1141) on import get libjemalloc.so.2: cannot allocate memory in static TLS block

2017-06-22 Thread Kevin Grealish (JIRA)
Kevin Grealish created ARROW-1141: - Summary: on import get libjemalloc.so.2: cannot allocate memory in static TLS block Key: ARROW-1141 URL: https://issues.apache.org/jira/browse/ARROW-1141 Project: A

[jira] [Created] (ARROW-1140) [C++] Allow optional build of plasma

2017-06-22 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-1140: Summary: [C++] Allow optional build of plasma Key: ARROW-1140 URL: https://issues.apache.org/jira/browse/ARROW-1140 Project: Apache Arrow Issue Type: Improve

[jira] [Created] (ARROW-1139) dlmalloc doesn't allow arrow to be built with clang 4 or gcc 7.1.1

2017-06-22 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-1139: Summary: dlmalloc doesn't allow arrow to be built with clang 4 or gcc 7.1.1 Key: ARROW-1139 URL: https://issues.apache.org/jira/browse/ARROW-1139 Project: Apache Arro

[Java] Best Practice to use/manage Allocator

2017-06-22 Thread Li Jin
Hello, I am writing some code that interacts with the Arrow Java library in Apache Spark and trying to understand the best way to use/manage buffer allocators. I am wondering: (1) Is buffer allocator thread safe? (2) Should I create a root allocator (maybe one per jvm?) to allocate all memory? T

Re: Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

2017-06-22 Thread Wes McKinney
If you want to use pure Python, you should probably just use the s3fs package. We should be able to get better throughput using C++ (and making using multithreading to make multiple requests for larger reads) -- the AWS C++ SDK probably has everything we need to make a really strong implementation.

Re: Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

2017-06-22 Thread Colin Nichols
I am using a pa.PythonFile() wrapping the file-like object provided by s3fs package. I am able to write parquet files directly to S3 this way. I am not reading using pyarrow (reading gzipped csvs with python) but I imagine it would work much the same. -- sent from my phone -- > On Jun 22, 201