Re: Contribute "RowSet" mechanism from Apache Drill?

2018-08-27 Thread Li Jin
Hi Paul, Thank you for the email. I think this is interesting. Arrow (Java API) currently doesn't have the capability of automatically limiting the memory size of record batches. In Spark we have similar needs to limit the size of record batches and have talked about implementing some kind of siz

[DISCUSS] Standardize Java style

2018-08-27 Thread Li Jin
Hi All, Bryan Cutler has started a PR to fix Java checkstyle warning (Thank you Bryan!). In my experience style is something hard to get consensus on due to personal preference, so I wonder if we can pick a well known style guide (say google style: https://google.github.io/styleguide/javaguide.htm

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-27 Thread Li Jin
Thank you both for the explanation, it makes sense. Another feedback I have is around flight.proto - some of the message (such as FlightDescriptor and FlightPutInstruction) is not very clear to me - it would be helpful to get some more explanation for those here or on the PR. Thanks! Li On Sun,

[jira] [Created] (ARROW-3125) [Python] Update ASV instructions

2018-08-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3125: - Summary: [Python] Update ASV instructions Key: ARROW-3125 URL: https://issues.apache.org/jira/browse/ARROW-3125 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] Standardize Java style

2018-08-27 Thread Bryan Cutler
Thanks for bringing this discussion up Li. I think we can use an existing style guide as a starting point, but ultimately we as a community should decide how to best fit it for the project. I believe we already have the google checkstlye as our Java rules configuration file, but already off the bat

Fwd: How to concatenate RecordBatches into a single RecordBatch?

2018-08-27 Thread Jacob Quinn Shenker
Hi all, Question: If I have a set of small (10-1000 rows) RecordBatches on disk or in memory, how can I (efficiently) concatenate/rechunk them into larger RecordBatches (so that each column is output as a contiguous array when written to a new Arrow buffer)? Context: With such small RecordBatches

[jira] [Created] (ARROW-3126) [Python] Add buffering option to pyarrow.open_stream to enable larger read ahead window for high latency file systems

2018-08-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3126: --- Summary: [Python] Add buffering option to pyarrow.open_stream to enable larger read ahead window for high latency file systems Key: ARROW-3126 URL: https://issues.apache.org/jira/br

Re: How to concatenate RecordBatches into a single RecordBatch?

2018-08-27 Thread Wes McKinney
hi Jacob, We have https://issues.apache.org/jira/browse/ARROW-549 about concatenating arrays. Someone needs to write the code and tests, and then we can easily add an API to "consolidate" table columns. If you have small record batches, could you read the entire file into memory before parsing it

Re: Contribute "RowSet" mechanism from Apache Drill?

2018-08-27 Thread Jacques Nadeau
This seems like it could be a useful addition. In general, our experience with writing Arrow structures is that the most optimal path is using columnar interaction rather than rowwise. That being said, most people start out by interacting with Arrow rowwise first and having an interface like this c

[jira] [Created] (ARROW-3127) [C++] Add Tutorial about Sending Tensor from C++ to Python

2018-08-27 Thread Simon Mo (JIRA)
Simon Mo created ARROW-3127: --- Summary: [C++] Add Tutorial about Sending Tensor from C++ to Python Key: ARROW-3127 URL: https://issues.apache.org/jira/browse/ARROW-3127 Project: Apache Arrow Issue T

Re: Contribute "RowSet" mechanism from Apache Drill?

2018-08-27 Thread Paul Rogers
Hi Jacques, Thanks much for the note. I wonder, when reading data into, or out of, Arrow, are not the interfaces often row-wise? For example, it is somewhat difficult to read a CSV file column-wise. Similarly, when serving a BI tool (for tables or charts), data must be presented row-wise. (JDBC

[jira] [Created] (ARROW-3128) [C++] Support system shared zlib

2018-08-27 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-3128: --- Summary: [C++] Support system shared zlib Key: ARROW-3128 URL: https://issues.apache.org/jira/browse/ARROW-3128 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-3129) [Packaging] Stop to use deprecated BuildRoot and Group in .rpm

2018-08-27 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-3129: --- Summary: [Packaging] Stop to use deprecated BuildRoot and Group in .rpm Key: ARROW-3129 URL: https://issues.apache.org/jira/browse/ARROW-3129 Project: Apache Arrow