RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-02 Thread Jed Brown
I would caution to please not commit to the MKL/BLAS model in which the library creates threads internally. It's a disaster for managing oversubscription and affinity issues among groups of threads and/or multiple processes (e.g., MPI). For example, a composable OpenMP technique is for the caller

Re: ARROW-3191: Making ArrowBuf work with arbitrary memory and setting io.netty.tryReflectionSetAccessible to true for java builds

2019-05-02 Thread Jacques Nadeau
I'm onboard with this change. On Fri, Apr 26, 2019 at 2:14 AM Siddharth Teotia wrote: > As part of working on this patch < > https://github.com/apache/arrow/pull/4151>, > I ran into a problem with jdk 9 and 11 builds. Since memory underlying > ArrowBuf may not necessarily be a ByteBuf (or any o

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-02 Thread Bryan Cutler
I looked at the updated SPIP and I think the reduced scope sounds better. >From the Spark Summit, it seemed like there was a lot of interest in columnar processing and this would be a good starting point to enable that. It would be great to hear some other peoples input too. Bryan On Tue, Apr 30,

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-02 Thread Malakhov, Anton
Thanks Wes! Sounds like a good way to go! We'll create a demo, as you suggested, implementing a parallel execution model for a simple analytics pipeline that reads and processes the files. My only concern is about adding more pipeline breaker nodes and compute intensive operations into this dem

Re: PARQUET-1411 / PR 4185

2019-05-02 Thread Wes McKinney
+ Parquet dev list Thanks Tim for working on this issue, I'm sorry I haven't been able to leave code review yet -- I've been busy with a bunch of other things and, since it's a large patch, I wanted to give thoughtful feedback. Feel free to push some more commits to that PR. I can prioritize gett

PARQUET-1411 / PR 4185

2019-05-02 Thread TP Boudreau
Hello Parquet-Arrow Team, Wes, A short while ago, I submitted PR 4185 ( https://github.com/apache/arrow/pull/4185) to implement in the C++ library the new logical annotations metadata available in the latest parquet.thrift spec (https://issues.apache.org/jira/browse/PARQUET-1411). I stopped commi

[jira] [Created] (ARROW-5251) [C++][Parquet] Bad initialization in statistics computation

2019-05-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5251: - Summary: [C++][Parquet] Bad initialization in statistics computation Key: ARROW-5251 URL: https://issues.apache.org/jira/browse/ARROW-5251 Project:

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-02 Thread Jacques Nadeau
If someone wants to run without bounds checking, why don't they simply flip the system property? Are they seeing that code not get eliminated in if they set that? I think people are optimizing the wrong things in this discussion. The memory address is available. Per Parth's comments, if you're work

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-02 Thread Wes McKinney
hi Anton, Thank you for bringing your expertise to the project -- this is a very useful discussion to have. Partly why our threading capabilities in the project are not further developed is that there is not much that needs to be parallelized. It would be like designing a supercharger when you do

[jira] [Created] (ARROW-5250) remove javadoc suppression on methods.

2019-05-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5250: -- Summary: remove javadoc suppression on methods. Key: ARROW-5250 URL: https://issues.apache.org/jira/browse/ARROW-5250 Project: Apache Arrow Issue Type: S

[jira] [Created] (ARROW-5249) Java Flight client doesn't handle auth correctly in some cases

2019-05-02 Thread Ryan Murray (JIRA)
Ryan Murray created ARROW-5249: -- Summary: Java Flight client doesn't handle auth correctly in some cases Key: ARROW-5249 URL: https://issues.apache.org/jira/browse/ARROW-5249 Project: Apache Arrow

[jira] [Created] (ARROW-5248) [Python] support dateutil timezones

2019-05-02 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5248: Summary: [Python] support dateutil timezones Key: ARROW-5248 URL: https://issues.apache.org/jira/browse/ARROW-5248 Project: Apache Arrow Issu

Re: How about inet4/inet6/macaddr data types?

2019-05-02 Thread David Li
Re: Java support, I've sketched out an implementation: https://github.com/lihalite/arrow/pull/2 On 5/1/19, Micah Kornfield wrote: >> >> I'm awaiting community feedback about the approach to implementing >> extension types, whether the approach that I've used (using the >> following keys in custom