Re: [Java] Append multiple record batches together?

2019-11-14 Thread Fan Liya
One use-case for ChunkedArray that comes to my mind is external sort for large vectors. Best, Liya Fan On Fri, Nov 15, 2019 at 2:14 PM Micah Kornfield wrote: > > > > Maybe Java can add the concept of Tables and ChunkedArrays sometime in > the > > future. > > > Is there a concrete use-case here?

[jira] [Created] (ARROW-7175) [Website] Add a security page to track when vulnerabilities are patched

2019-11-14 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7175: -- Summary: [Website] Add a security page to track when vulnerabilities are patched Key: ARROW-7175 URL: https://issues.apache.org/jira/browse/ARROW-7175 Project: Ap

Re: ConcatenateTables APIs

2019-11-14 Thread Micah Kornfield
This sounds like a reasonable design to me. One question I had for SchemaUnificationOptions will those only be applicable to Arrow schemas or does it make sense to extend them for other use-cases (like DataSet APIs). Cheers, Micah On Fri, Nov 8, 2019 at 10:27 AM Zhuo Peng wrote: > Hi, > > http

Re: [Java] Call for reviewers

2019-11-14 Thread Micah Kornfield
Thanks for taking these on. On Fri, Nov 8, 2019 at 12:20 PM David Li wrote: > I took a look at #5630 (ARROW-6662) and #5751 (ARROW-7019). > > Best, > David > > On 11/7/19, Micah Kornfield wrote: > > There are a few open PRs that I think could either use a first or second > > set of eyes: > > >

Re: [Java] Append multiple record batches together?

2019-11-14 Thread Micah Kornfield
> > Maybe Java can add the concept of Tables and ChunkedArrays sometime in the > future. Is there a concrete use-case here? It might pay to open up some JIRAs. I'm still not 100% clear on the rationale for the way VectorSchemaRoot is designed and how that would relate to Table/ChunkedArrays (or

Re: Parquet cpp status

2019-11-14 Thread Micah Kornfield
#1 if there isn't a JIRA I would guess no-one is working on it (Note I would expect at least the initial work to be in aParquet JIRA item, and this is probably a discussion for that mailing list). #2. There are some open PR to expose the parquet reader through JNI to java [1] #3. Its possible Dremi

[jira] [Created] (ARROW-7174) [Python] Expose dictionary size parameter in python.

2019-11-14 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7174: -- Summary: [Python] Expose dictionary size parameter in python. Key: ARROW-7174 URL: https://issues.apache.org/jira/browse/ARROW-7174 Project: Apache Arrow

[jira] [Created] (ARROW-7173) Add test to verify Map field names can be arbitrary

2019-11-14 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-7173: --- Summary: Add test to verify Map field names can be arbitrary Key: ARROW-7173 URL: https://issues.apache.org/jira/browse/ARROW-7173 Project: Apache Arrow Issue

Re: [C++][Parquet]: Stream API handling of optional fields

2019-11-14 Thread Micah Kornfield
I think there are potentially other places in the Arrow code base that "optional" could be useful (e.g. a row-reader like class for Arrow Tables). It looks like there is at least 1 header only optional library [1] that is c++17 forward compatible. I think I would lean towards vendoring that or an

[C++][Parquet]: Stream API handling of optional fields

2019-11-14 Thread Gawain Bolton
Hello, I would like to add support for handling optional fields to the parquet::StreamReader and parquet::StreamWriter classes which I recently contributed (thank you!). Ideally I would do this by using std::optional like this:     parquet::StreamWriter writer{ parquet::ParquetFileWriter::Op

Re: Building Arrow 0.15.1 using dependencies in local source folder

2019-11-14 Thread Neal Richardson
I am not an expert on this, but it seems you can specify `*_ROOT` arguments to cmake, like https://github.com/apache/arrow/blob/master/ci/PKGBUILD#L90-L91 Maybe that does what you need? Neal On Thu, Nov 14, 2019 at 12:45 PM Tahsin Hassan wrote: > Hi all, > > I am trying to build out arrow 0.1

Building Arrow 0.15.1 using dependencies in local source folder

2019-11-14 Thread Tahsin Hassan
Hi all, I am trying to build out arrow 0.15.1. The dependencies for arrow, e.g. thrift, double-conversion are in a local source folder and we need to build the dependencies from that location. I read up on https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds

Disabling Gandiva, Plasma, or other components

2019-11-14 Thread Segovia, Carlos EXT
Hi, I read from https://github.com/apache/arrow/issues/4216 that it's posible to disabling Gandiva, Plasma, or other components that you do not require. I',m trying to deploy a aws lambda with pandas and pyarrow but I get the error Unzipped size must be smaller than 262144000 bytes How can I disa

[jira] [Created] (ARROW-7172) [C++][Dataset] Improve format of Expression::ToString

2019-11-14 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7172: --- Summary: [C++][Dataset] Improve format of Expression::ToString Key: ARROW-7172 URL: https://issues.apache.org/jira/browse/ARROW-7172 Project: Apache Arrow Issu

[jira] [Created] (ARROW-7171) [Ruby] Pass Array for Arrow::Table#filter

2019-11-14 Thread Yosuke Shiro (Jira)
Yosuke Shiro created ARROW-7171: --- Summary: [Ruby] Pass Array for Arrow::Table#filter Key: ARROW-7171 URL: https://issues.apache.org/jira/browse/ARROW-7171 Project: Apache Arrow Issue Type: New

Re: Achieving parity with Java extension types in Python

2019-11-14 Thread Justin Polchlopek
I made a PR for this issue at https://github.com/apache/arrow/pull/5835. Would love some more detail about what was intended by the initial issue and what would be a better way. On Tue, Nov 12, 2019 at 11:25 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Sorry for the delay in

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-14 Thread Micah Kornfield
Ok, anything else do discuss? Otherwise I'll plan on a new vote with the original language + an explicit call-out that dictionary replacement isn't supported for the file format in the PR On Thursday, November 14, 2019, Antoine Pitrou wrote: > > Right. The dictionaries can be found from the fi

Re: [Java] Question About Vector Allocation

2019-11-14 Thread Micah Kornfield
ValueCount include both null and not null values. Perhaps a better name for the method would have been setSize or setLength. On Thursday, November 14, 2019, azim afroozeh wrote: > Thanks for your answer. I have one more question. In this test function for > example ( > https://github.com/apache

Re: [Java] Question About Vector Allocation

2019-11-14 Thread azim afroozeh
Thanks for your answer. I have one more question. In this test function for example ( https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java#L1524) : there is a for loop which tries to fill in some values but not all values. It leaves som

[jira] [Created] (ARROW-7170) [C++] Bundled ORC fails linking

2019-11-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7170: - Summary: [C++] Bundled ORC fails linking Key: ARROW-7170 URL: https://issues.apache.org/jira/browse/ARROW-7170 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-7169) [C++] Vendor uriparser library

2019-11-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7169: - Summary: [C++] Vendor uriparser library Key: ARROW-7169 URL: https://issues.apache.org/jira/browse/ARROW-7169 Project: Apache Arrow Issue Type: Wish

[NIGHTLY] Arrow Build Report for Job nightly-2019-11-14-0

2019-11-14 Thread Crossbow
Arrow Build Report for Job nightly-2019-11-14-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-14-0 Failed Tasks: - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-14-0-travis-homebrew-cpp - test-conda-python-3

[jira] [Created] (ARROW-7168) pa.array() doesn't respect provided dictionary type with all NaNs

2019-11-14 Thread Thomas Buhrmann (Jira)
Thomas Buhrmann created ARROW-7168: -- Summary: pa.array() doesn't respect provided dictionary type with all NaNs Key: ARROW-7168 URL: https://issues.apache.org/jira/browse/ARROW-7168 Project: Apache A

[Discuss][Java] Appropriate semantics for comparing values in UnionVector

2019-11-14 Thread Fan Liya
Dear all, The problem arises from the discussion in a PR: https://github.com/apache/arrow/pull/5544#discussion_r338394941. We are trying to come up with a proper semantics to compare values in UnionVectors. According to the current logic in the code base, two values from two UnionVectors are com

Re: [Java] Question About Vector Allocation

2019-11-14 Thread Fan Liya
Hi Azim, According to the current API, after filling in some values, you have to set the value count manually (through the setValueCount method). Otherwise, the value count remains 0. Best, Liya Fan On Thu, Nov 14, 2019 at 6:33 PM azim afroozeh wrote: > Thanks for your answer. So the valueCou

Re: [Java] Question About Vector Allocation

2019-11-14 Thread azim afroozeh
Thanks for your answer. So the valueCount shows the number of data filled in the vector. Then I would like to ask you why the valueCount after setting some values is 0? for example: ( https://github.com/apache/arrow/blob/3fbbcdaf77a9e354b6bd07ec1fd1dac005a505c9/java/vector/src/test/java/org/apache

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-14 Thread Antoine Pitrou
Right. The dictionaries can be found from the file footer, so it seems ok. Thank you Regards Antoine. Le 14/11/2019 à 07:11, Micah Kornfield a écrit : > I'll add for: > > If so, how does this play with the fact that there potentially are delta >> dictionaries in the "stream"? > > That in

[jira] [Created] (ARROW-7167) [CI][Python] Add nightly tests for older pandas versions to Github Actions

2019-11-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7167: Summary: [CI][Python] Add nightly tests for older pandas versions to Github Actions Key: ARROW-7167 URL: https://issues.apache.org/jira/browse/ARROW-7167