Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-16 Thread Joris Van den Bossche
That sounds like a good solution. Having the zero-copy behavior depending on whether you have only 1 column of a certain type or not, might lead to surprising results. To avoid yet another keyword, only doing it when split_blocks=True sounds good to me (in practice, that's also when it will happen

[jira] [Created] (ARROW-7600) [C++][Parquet] Add a basic disabled unit test to excercise nesting functionality

2020-01-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7600: -- Summary: [C++][Parquet] Add a basic disabled unit test to excercise nesting functionality Key: ARROW-7600 URL: https://issues.apache.org/jira/browse/ARROW-7600 Pr

Re: [CI] Java build broken on master

2020-01-16 Thread Ji Liu
Thanks, PR opened https://github.com/apache/arrow/pull/6216, please help merge once the build turns green. -- From:Micah Kornfield Send Time:2020年1月17日(星期五) 14:53 To:Ji Liu Cc:dev Subject:Re: [CI] Java build broken on master OK,

Re: [CI] Java build broken on master

2020-01-16 Thread Micah Kornfield
OK, I've opened https://issues.apache.org/jira/browse/ARROW-7599 to track. On Thu, Jan 16, 2020 at 10:49 PM Ji Liu wrote: > I was fixing, and will open a PR later. > > Thanks, > Ji Liu > > -- > From:Micah Kornfield > S

[jira] [Created] (ARROW-7599) [Java] Fix build break due to change in RangeEqualsVisitor

2020-01-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7599: -- Summary: [Java] Fix build break due to change in RangeEqualsVisitor Key: ARROW-7599 URL: https://issues.apache.org/jira/browse/ARROW-7599 Project: Apache Arrow

Re: [CI] Java build broken on master

2020-01-16 Thread Ji Liu
I was fixing, and will open a PR later. Thanks, Ji Liu -- From:Micah Kornfield Send Time:2020年1月17日(星期五) 14:48 To:dev Subject:[CI] Java build broken on master This was due to an unexpected conflict between two patche

[CI] Java build broken on master

2020-01-16 Thread Micah Kornfield
This was due to an unexpected conflict between two patches I just merged. I'm going to see if I can fix this quickly, otherwise I will rollback.

Re: [C++] Arrow added to OSS-Fuzz

2020-01-16 Thread Marco Neumann
Hey Antoine,  Thanks a lot also from my side.  The build is likely currently succeeding due to the Fuzzing work done by fuzzit. We had loads of crashes in the beginning and fixed tons of edge cases, especially around null pointer handling.  I also have some code locally for a Parquet fuzzing s

Re: [Format] Make fields required?

2020-01-16 Thread Micah Kornfield
I too, couldn't find anything that says this would break backwards compatibility for the binary format. But it probably pays to open an issue with the flatbuffer team just to be safe. Two points: 1. I'd like to make sure we are conservative in choosing "definitely required" 2. Before committing

[jira] [Created] (ARROW-7598) Unable to install pyarrow

2020-01-16 Thread Rockwell Shabani (Jira)
Rockwell Shabani created ARROW-7598: --- Summary: Unable to install pyarrow Key: ARROW-7598 URL: https://issues.apache.org/jira/browse/ARROW-7598 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-7597) [C++] Improvements to CMake configuration console summary

2020-01-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7597: --- Summary: [C++] Improvements to CMake configuration console summary Key: ARROW-7597 URL: https://issues.apache.org/jira/browse/ARROW-7597 Project: Apache Arrow

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-16 Thread Wes McKinney
I created https://issues.apache.org/jira/browse/ARROW-7596 and made it a blocker for 0.16.0 so this does not get lost in the shuffle On Thu, Jan 16, 2020 at 3:43 PM Wes McKinney wrote: > > hi Joris, > > Thanks for investigating this. It seems there were some unintended > consequences of the zero-

[jira] [Created] (ARROW-7596) [Python] Only apply zero-copy DataFrame block optimizations when split_blocks=True

2020-01-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7596: --- Summary: [Python] Only apply zero-copy DataFrame block optimizations when split_blocks=True Key: ARROW-7596 URL: https://issues.apache.org/jira/browse/ARROW-7596 Projec

Re: [Format] Make fields required?

2020-01-16 Thread Wes McKinney
If using "required" does not alter the Flatbuffers binary format (it doesn't seem that it does, it adds non-null assertions on the write path and additional checks in the read verifiers, is that accurate?), then it may be worthwhile to set it on "definitely required" fields so spare clients from ha

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-16 Thread Wes McKinney
hi Joris, Thanks for investigating this. It seems there were some unintended consequences of the zero-copy optimizations from ARROW-3789. Another way forward might be to "opt in" to this behavior, or to only do the zero copy optimizations when split_blocks=True. What do you think? - Wes On Thu,

[jira] [Created] (ARROW-7595) [R][CI] R appveyor job fails on glob

2020-01-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7595: -- Summary: [R][CI] R appveyor job fails on glob Key: ARROW-7595 URL: https://issues.apache.org/jira/browse/ARROW-7595 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-7594) [C++] Implement HTTP and FTP file systems

2020-01-16 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7594: --- Summary: [C++] Implement HTTP and FTP file systems Key: ARROW-7594 URL: https://issues.apache.org/jira/browse/ARROW-7594 Project: Apache Arrow Issue Type: New

[jira] [Created] (ARROW-7593) [CI][Python] Python datasets failing on master / not run on CI

2020-01-16 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7593: Summary: [CI][Python] Python datasets failing on master / not run on CI Key: ARROW-7593 URL: https://issues.apache.org/jira/browse/ARROW-7593 Project:

[jira] [Created] (ARROW-7592) [C++] Fix crashes on corrupt IPC input

2020-01-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7592: - Summary: [C++] Fix crashes on corrupt IPC input Key: ARROW-7592 URL: https://issues.apache.org/jira/browse/ARROW-7592 Project: Apache Arrow Issue Type: Bug

[NIGHTLY] Arrow Build Report for Job nightly-2020-01-16-0

2020-01-16 Thread Crossbow
Arrow Build Report for Job nightly-2020-01-16-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-16-0 Failed Tasks: - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-16-0-azure-centos-8 - gandiva-jar-osx: URL: htt

[Format] Make fields required?

2020-01-16 Thread Antoine Pitrou
Hello, In Flatbuffers, all fields are optional by default. It means that the reader can get NULL (in C++) for a missing field. In turn, this means that message validation (at least in C++) should check all child table fields for non-NULL. Not only is this burdensome, but it's easy to miss som

[jira] [Created] (ARROW-7591) [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array

2020-01-16 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7591: Summary: [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array Key: ARROW-7591 URL: https://issues.apache.org/jira/browse/ARROW-7591

PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-16 Thread Joris Van den Bossche
So the spark integration build started to fail, and with the following test error: == ERROR: test_toPandas_batch_order (pyspark.sql.tests.test_arrow.EncryptionArrowTests) ---

[jira] [Created] (ARROW-7590) Update .gitignore for for thirdparty

2020-01-16 Thread Jiajia Li (Jira)
Jiajia Li created ARROW-7590: Summary: Update .gitignore for for thirdparty Key: ARROW-7590 URL: https://issues.apache.org/jira/browse/ARROW-7590 Project: Apache Arrow Issue Type: Improvement