[jira] [Created] (ARROW-2065) Fix bug in SerializationContext.clone().

2018-01-30 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-2065: --- Summary: Fix bug in SerializationContext.clone(). Key: ARROW-2065 URL: https://issues.apache.org/jira/browse/ARROW-2065 Project: Apache Arrow Issue Typ

[jira] [Created] (ARROW-2064) [GLib] Add common build problems link to the install section

2018-01-30 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2064: --- Summary: [GLib] Add common build problems link to the install section Key: ARROW-2064 URL: https://issues.apache.org/jira/browse/ARROW-2064 Project: Apache Arrow

[jira] [Created] (ARROW-2063) [C++] Implement variant of FixedSizeBufferWriter that also supports reading (like MemoryMappedFile)

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2063: --- Summary: [C++] Implement variant of FixedSizeBufferWriter that also supports reading (like MemoryMappedFile) Key: ARROW-2063 URL: https://issues.apache.org/jira/browse/ARROW-2063

[jira] [Created] (ARROW-2062) [C++] Stalled builds in test_serialization.py in Travis CI

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2062: --- Summary: [C++] Stalled builds in test_serialization.py in Travis CI Key: ARROW-2062 URL: https://issues.apache.org/jira/browse/ARROW-2062 Project: Apache Arrow

Re: Duplicate Columns

2018-01-30 Thread Wes McKinney
In a sense, field names in Arrow schemas are "just data". Whether or not the data is invalid in the context of a particular use case may vary a great deal -- for example pandas supports duplicate column names (to its own hardship, admittedly) while most SQL systems do not. Sadly, sometimes duplica

[jira] [Created] (ARROW-2061) [C++] Run ASAN builds in Travis CI

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2061: --- Summary: [C++] Run ASAN builds in Travis CI Key: ARROW-2061 URL: https://issues.apache.org/jira/browse/ARROW-2061 Project: Apache Arrow Issue Type: Improvement

Duplicate Columns

2018-01-30 Thread Phillip Cloud
I'm working on ARROW-1974 right now, and it's turning out to be quite complex due to both Arrow and Parquet allowing duplicate columns. Apparently you can also write duplicate column names to parquet by way of spark. In my opinion, allowing duplic

[jira] [Created] (ARROW-2060) [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2060: --- Summary: [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts Key: ARROW-2060 URL: https://issues.apache.org/jira/browse/ARROW-2060

[jira] [Created] (ARROW-2059) [Python] Possible performance regression in Feather read/write path

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2059: --- Summary: [Python] Possible performance regression in Feather read/write path Key: ARROW-2059 URL: https://issues.apache.org/jira/browse/ARROW-2059 Project: Apache Arrow

[jira] [Created] (ARROW-2058) Add wheels for Alpine Linux

2018-01-30 Thread Omer Katz (JIRA)
Omer Katz created ARROW-2058: Summary: Add wheels for Alpine Linux Key: ARROW-2058 URL: https://issues.apache.org/jira/browse/ARROW-2058 Project: Apache Arrow Issue Type: Task Component

Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-30 Thread simba nyatsanga
Hi Everyone, Just an update on the above questions. I've updated the numbers in Google sheet using data with less entropy here: https://docs.google.com/spreadsheets/d/1by1vCaO2p24PLq_NAA5Ckh1n3i-SoFYrRcfi1siYKFQ/edit#gid=0 I've also got the benchmarking code. Although some of the data examples mi

[jira] [Created] (ARROW-2057) [Python] Configure size of data pages in pyarrow.parquet.write_table

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2057: --- Summary: [Python] Configure size of data pages in pyarrow.parquet.write_table Key: ARROW-2057 URL: https://issues.apache.org/jira/browse/ARROW-2057 Project: Apache Arro

Re: Moving Arrow Java to JDK 8

2018-01-30 Thread Li Jin
I created https://issues.apache.org/jira/browse/ARROW-2055 to track. Also created the javadoc issue as subtask. On Tue, Jan 30, 2018 at 11:44 AM, Dwight Gunning wrote: > Thanks Li, > > There is no JIRA as yet except for Arrow 2015 for the JODA time migration > to Java 8 Time - so please create

[jira] [Created] (ARROW-2056) Fix javadoc generation for Java 8

2018-01-30 Thread Li Jin (JIRA)
Li Jin created ARROW-2056: - Summary: Fix javadoc generation for Java 8 Key: ARROW-2056 URL: https://issues.apache.org/jira/browse/ARROW-2056 Project: Apache Arrow Issue Type: Sub-task Rep

[jira] [Created] (ARROW-2055) [Java] Upgrade to Java 8

2018-01-30 Thread Li Jin (JIRA)
Li Jin created ARROW-2055: - Summary: [Java] Upgrade to Java 8 Key: ARROW-2055 URL: https://issues.apache.org/jira/browse/ARROW-2055 Project: Apache Arrow Issue Type: Task Reporter: Li Jin

Re: Moving Arrow Java to JDK 8

2018-01-30 Thread Dwight Gunning
Thanks Li, There is no JIRA as yet except for Arrow 2015 for the JODA time migration to Java 8 Time - so please create a JIRA. Cheers Dwight Sent from my iPhone > On Jan 30, 2018, at 11:35 AM, Li Jin wrote: > > Thanks Dwight, > > I think it would be good to track the required items for mo

Re: Moving Arrow Java to JDK 8

2018-01-30 Thread Li Jin
Thanks Dwight, I think it would be good to track the required items for moving to Java 8 support. As far as I know, Arrow works with Java 8 already so this shouldn't be too hard. Dependencies wise downstream projects Spark 2.3 already drops Java 7 support, I am not sure about Dremio. Is there a

[jira] [Created] (ARROW-2054) Compilation warnings

2018-01-30 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2054: - Summary: Compilation warnings Key: ARROW-2054 URL: https://issues.apache.org/jira/browse/ARROW-2054 Project: Apache Arrow Issue Type: Task Compon

[jira] [Created] (ARROW-2053) [C++] Build instruction is incomplete

2018-01-30 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2053: --- Summary: [C++] Build instruction is incomplete Key: ARROW-2053 URL: https://issues.apache.org/jira/browse/ARROW-2053 Project: Apache Arrow Issue Type: Improvem