[jira] [Created] (ARROW-8207) [Packaging][wheel] Use LLVM 8 in manylinux2010 and manylinux2014

2020-03-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8207: --- Summary: [Packaging][wheel] Use LLVM 8 in manylinux2010 and manylinux2014 Key: ARROW-8207 URL: https://issues.apache.org/jira/browse/ARROW-8207 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-24 Thread Micah Kornfield
> > Compression ratios ranging from ~50% with LZ4 and ~75% with ZSTD on > the Taxi dataset to ~87% with LZ4 and ~90% with ZSTD on the Fannie Mae > dataset. So that's a huge space savings One more question on this. What was the average row-batch size used? I see in the proposal some buffers might

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-24 Thread Wes McKinney
>From what I've found searching on the internet - Java: * ZSTD -- JNI-based library available * LZ4 -- both JNI and native Java available - Go: ZSTD is a C binding, while there is an LZ4 native Go implementation - Rust: bindings to both C libraries available - C# wrapper libraries seem to be av

[jira] [Created] (ARROW-8206) [R] Minor fix for backwards compatibility on Linux installation

2020-03-24 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8206: -- Summary: [R] Minor fix for backwards compatibility on Linux installation Key: ARROW-8206 URL: https://issues.apache.org/jira/browse/ARROW-8206 Project: Apache Arr

[jira] [Created] (ARROW-8205) [Rust] Arrow should enforce unique field names in a schema

2020-03-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8205: - Summary: [Rust] Arrow should enforce unique field names in a schema Key: ARROW-8205 URL: https://issues.apache.org/jira/browse/ARROW-8205 Project: Apache Arrow Iss

[jira] [Created] (ARROW-8204) [Rust] [DataFusdion] Add support for aliased expressions in SQL

2020-03-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8204: - Summary: [Rust] [DataFusdion] Add support for aliased expressions in SQL Key: ARROW-8204 URL: https://issues.apache.org/jira/browse/ARROW-8204 Project: Apache Arrow

[jira] [Created] (ARROW-8203) [C#] "dotnet pack" is failed

2020-03-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8203: --- Summary: [C#] "dotnet pack" is failed Key: ARROW-8203 URL: https://issues.apache.org/jira/browse/ARROW-8203 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-8202) [Rust] SIGSEGV when using StringBuilder with jemalloc

2020-03-24 Thread Evan Chan (Jira)
Evan Chan created ARROW-8202: Summary: [Rust] SIGSEGV when using StringBuilder with jemalloc Key: ARROW-8202 URL: https://issues.apache.org/jira/browse/ARROW-8202 Project: Apache Arrow Issue Type

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-24 Thread Micah Kornfield
Thanks Wes, It would be nice if contributors to other languages could express there opinions on the two compression formats selected (in particular if they represent challenges in using a suitable library for decompressing) -Micah On Tue, Mar 24, 2020 at 3:08 PM Wes McKinney wrote: > I just

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-24 Thread Wes McKinney
I just opened this pull request with the proposed format additions based on this discussion: https://github.com/apache/arrow/pull/6707 If there is more feedback about the details, it would be good to know it now. In a couple of days I would like to call a vote to see if there is interest in forma

Re: Summary of RLE and other compression efforts?

2020-03-24 Thread Evan Chan
Hi Micah, Hope everyone is staying safe! > On Mar 16, 2020, at 9:41 PM, Micah Kornfield wrote: > > I feel a little uncomfortable in the fact that there isn't a more clearly > defined dividing line for what belongs in Arrow and what doesn't. I suppose > this is what discussions like these are

[jira] [Created] (ARROW-8201) [Python][Dataset] Improve ergonomics of FileFragment

2020-03-24 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8201: --- Summary: [Python][Dataset] Improve ergonomics of FileFragment Key: ARROW-8201 URL: https://issues.apache.org/jira/browse/ARROW-8201 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8200) [GLib] Rename garrow_file_system_target_info{,s}() to ..._file_info{,s}()

2020-03-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8200: --- Summary: [GLib] Rename garrow_file_system_target_info{,s}() to ..._file_info{,s}() Key: ARROW-8200 URL: https://issues.apache.org/jira/browse/ARROW-8200 Project: Apache

[jira] [Created] (ARROW-8199) Guidance for creating multi-column sort on Table example?

2020-03-24 Thread Scott Wilson (Jira)
Scott Wilson created ARROW-8199: --- Summary: Guidance for creating multi-column sort on Table example? Key: ARROW-8199 URL: https://issues.apache.org/jira/browse/ARROW-8199 Project: Apache Arrow

[jira] [Created] (ARROW-8198) [C++] Diffing should handle null arrays

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8198: - Summary: [C++] Diffing should handle null arrays Key: ARROW-8198 URL: https://issues.apache.org/jira/browse/ARROW-8198 Project: Apache Arrow Issue Type: Im

[jira] [Created] (ARROW-8197) [Rust] DataFusion "create_physical_plan" returns incorrect schema?

2020-03-24 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8197: -- Summary: [Rust] DataFusion "create_physical_plan" returns incorrect schema? Key: ARROW-8197 URL: https://issues.apache.org/jira/browse/ARROW-8197 Project: Apache Arrow

Preparing for 0.17.0 Arrow release

2020-03-24 Thread Neal Richardson
Hi all, A few weeks ago, there seemed to be consensus (lazy, at least) for a 0.17 release at the end of the month. Judging from https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.17.0+Release, it looks like we're getting closer. I'd encourage everyone to review their backlogs and (1) bump f

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-24-0

2020-03-24 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-24-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-24-0 Failed Tasks: - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-24-0-travis-gandiva-jar-trusty - test-co

[jira] [Created] (ARROW-8196) [Python] Empty table creation from schema with nested dictionary type

2020-03-24 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8196: Summary: [Python] Empty table creation from schema with nested dictionary type Key: ARROW-8196 URL: https://issues.apache.org/jira/browse/ARROW-8196 P

[jira] [Created] (ARROW-8195) [CI] Remove Boost download step in Github Actions

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8195: - Summary: [CI] Remove Boost download step in Github Actions Key: ARROW-8195 URL: https://issues.apache.org/jira/browse/ARROW-8195 Project: Apache Arrow Issu

[jira] [Created] (ARROW-8194) [CI] Github Actions Windows job should run tests in parallel

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8194: - Summary: [CI] Github Actions Windows job should run tests in parallel Key: ARROW-8194 URL: https://issues.apache.org/jira/browse/ARROW-8194 Project: Apache Arrow