Re: Timeline for 0.15.0 release

2019-09-25 Thread Micah Kornfield
Just an I've started the RC generation process off, the last commit from master is [1] I am currently waiting the crossbow builds (build-690 on ursa-labs/crossbow). I think this will take a little while so I will pick it up tomorrow (Thursday). Thanks, Micah [1] https://github.com/apache/arrow/

Thread-safety guarantees of pyarrow Table (and other) objects

2019-09-25 Thread Yevgeni Litvin
Where in the documentation can I find information about thread-safety guarantee of arrow classes? In particular, is the following usage of pyarrow.Table showed by the pseudo-code thread-safe? arrow_table = pa.Table.from_pandas(df) def other_thread_worker_impl(arrow_table): arrow_table.colu

[jira] [Created] (ARROW-6700) [Rust] [DataFusion] Use new parquet arrow reader

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6700: - Summary: [Rust] [DataFusion] Use new parquet arrow reader Key: ARROW-6700 URL: https://issues.apache.org/jira/browse/ARROW-6700 Project: Apache Arrow Issue Type: I

Re: Unnesting ListArrays

2019-09-25 Thread Wes McKinney
hi Suhail, This follows the columnar format closely. The List layout is composed from a child array providing the "inner" values, which are given the List interpretation by adding an offsets buffer, and a validity buffer to distinguish null from 0-length list values. So flatten() here just returns

Unnesting ListArrays

2019-09-25 Thread Suhail Razzak
Hi, I'm working through a certain use case where I'm unnesting ListArrays, but I noticed something peculiar - null ListValues are not retained in the unnested array. E.g. In [0]: arr = pa.array([[0, 1], [0], None, None]) In [1]: arr.flatten() Out [1]: [0, 1, 0] While I would have expected [0, 1,

Re: Timeline for 0.15.0 release

2019-09-25 Thread Neal Richardson
IMO it's too risky to add something that adds a dependency (aws-sdk-cpp) on the day of cutting a release. Neal On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs wrote: > > We don't have a comprehensive documentation yet, so let's postpone it. > > > On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs

Re: Timeline for 0.15.0 release

2019-09-25 Thread Krisztián Szűcs
We don't have a comprehensive documentation yet, so let's postpone it. On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs wrote: > The S3 python bindings would be a nice addition to the release. > I don't think we should block on this but the PR is ready. Opinions? > https://github.com/apache/arro

Re: Timeline for 0.15.0 release

2019-09-25 Thread Krisztián Szűcs
The S3 python bindings would be a nice addition to the release. I don't think we should block on this but the PR is ready. Opinions? https://github.com/apache/arrow/pull/5423 On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield wrote: > OK, I'll start the process today. I'll send up e-mail update

Re: Build issues on macOS [newbie]

2019-09-25 Thread Tarek Allam Jr .
Thanks for the advice Uwe and Neal. I tried your suggestion (as well as turning many of the flags to off) but then ran into other errors afterwards such as: -- Using ZSTD_ROOT: /usr/local/anaconda3/envs/main CMake Error at /usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStand

Re: Parquet file reading performance

2019-09-25 Thread Joris Van den Bossche
>From looking a little bit further into this, it seems that it is mainly pandas who is slower in creating a Series from an array of datetime64 compared from an array of ints. And especially if it is not nanosecond resolution: In [29]: a_int = pa.array(np.arange(10)) In [30]: %timeit a_int.to_

Re: Timeline for 0.15.0 release

2019-09-25 Thread Wes McKinney
Yes, all systems go as far as I'm concerned. On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson wrote: > > Andy's DataFusion issue and Wes's Parquet one have both been merged, > and it looks like the LICENSE issue is being resolved as I type. So > are we good to go now? > > Neal > > > On Tue, Sep 24

Re: Timeline for 0.15.0 release

2019-09-25 Thread Micah Kornfield
OK, I'll start the process today. I'll send up e-mail updates as I make progress. On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney wrote: > Yes, all systems go as far as I'm concerned. > > On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson > wrote: > > > > Andy's DataFusion issue and Wes's Parquet on

[jira] [Created] (ARROW-6699) [C++] Add Parquet docs

2019-09-25 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6699: - Summary: [C++] Add Parquet docs Key: ARROW-6699 URL: https://issues.apache.org/jira/browse/ARROW-6699 Project: Apache Arrow Issue Type: Improvement

Re: Timeline for 0.15.0 release

2019-09-25 Thread Neal Richardson
Andy's DataFusion issue and Wes's Parquet one have both been merged, and it looks like the LICENSE issue is being resolved as I type. So are we good to go now? Neal On Tue, Sep 24, 2019 at 10:30 PM Andy Grove wrote: > > I found a last minute issue with DataFusion (Rust) and would appreciate it

[jira] [Created] (ARROW-6698) Please support Python __slots__

2019-09-25 Thread John Yost (Jira)
John Yost created ARROW-6698: Summary: Please support Python __slots__ Key: ARROW-6698 URL: https://issues.apache.org/jira/browse/ARROW-6698 Project: Apache Arrow Issue Type: New Feature

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Wes McKinney
Thanks Krisz. It doesn't appear there is anything here stopping us from releasing On Wed, Sep 25, 2019 at 9:15 AM Krisztián Szűcs wrote: > > wheel-osx-cp35m has failed with an unrelated timeout error, restarted it: > https://travis-ci.org/ursa-labs/crossbow/builds/589326914 > > On Wed, Sep 25,

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Krisztián Szűcs
wheel-osx-cp35m has failed with an unrelated timeout error, restarted it: https://travis-ci.org/ursa-labs/crossbow/builds/589326914 On Wed, Sep 25, 2019 at 4:11 PM Crossbow wrote: > > Arrow Build Report for Job nightly-2019-09-25-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branche

[NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Crossbow
Arrow Build Report for Job nightly-2019-09-25-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0 Failed Tasks: - wheel-osx-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-osx-cp35m - docker-cpp-fu

[jira] [Created] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6697: - Summary: [Rust] [DataFusion] Validate that all parquet partitions have the same schema Key: ARROW-6697 URL: https://issues.apache.org/jira/browse/ARROW-6697 Project: Apache

[jira] [Created] (ARROW-6696) [Rust] [DataFusion] Implement simple math operations in physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6696: - Summary: [Rust] [DataFusion] Implement simple math operations in physical query plan Key: ARROW-6696 URL: https://issues.apache.org/jira/browse/ARROW-6696 Project: Apache A

Re: Parquet file reading performance

2019-09-25 Thread Joris Van den Bossche
Hi Maarten, Thanks for the reproducible script. I ran it on my laptop on pyarrow master, and not seeing the difference between both datetime indexes: Versions: Python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) [GCC 7.3.0] on linux numpy:1.16.4 pandas: 0.26.0.dev0+

[jira] [Created] (ARROW-6695) [Rust] [DataFusion] Remove execution of logical plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6695: - Summary: [Rust] [DataFusion] Remove execution of logical plan Key: ARROW-6695 URL: https://issues.apache.org/jira/browse/ARROW-6695 Project: Apache Arrow Issue Typ

[jira] [Created] (ARROW-6694) [Rust] [DataFusion] Update integration tests to use physical plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6694: - Summary: [Rust] [DataFusion] Update integration tests to use physical plan Key: ARROW-6694 URL: https://issues.apache.org/jira/browse/ARROW-6694 Project: Apache Arrow

[jira] [Created] (ARROW-6693) [Rust] [DataFusion] Update unit tests to use physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6693: - Summary: [Rust] [DataFusion] Update unit tests to use physical query plan Key: ARROW-6693 URL: https://issues.apache.org/jira/browse/ARROW-6693 Project: Apache Arrow

[jira] [Created] (ARROW-6692) [Rust] [DataFusion] Update examples to use physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6692: - Summary: [Rust] [DataFusion] Update examples to use physical query plan Key: ARROW-6692 URL: https://issues.apache.org/jira/browse/ARROW-6692 Project: Apache Arrow

[jira] [Created] (ARROW-6691) [Rust] [DataFusion] Use tokio and Futures instead of spawning threads

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6691: - Summary: [Rust] [DataFusion] Use tokio and Futures instead of spawning threads Key: ARROW-6691 URL: https://issues.apache.org/jira/browse/ARROW-6691 Project: Apache Arrow

[jira] [Created] (ARROW-6690) [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6690: - Summary: [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD Key: ARROW-6690 URL: https://issues.apache.org/jira/browse/ARROW-6690 Project: Apache Arrow

[jira] [Created] (ARROW-6689) [Rust] [DataFusion] Optimize query execution

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6689: - Summary: [Rust] [DataFusion] Optimize query execution Key: ARROW-6689 URL: https://issues.apache.org/jira/browse/ARROW-6689 Project: Apache Arrow Issue Type: New F

[jira] [Created] (ARROW-6688) [Packaging] Include s3 support in the conda packages

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6688: -- Summary: [Packaging] Include s3 support in the conda packages Key: ARROW-6688 URL: https://issues.apache.org/jira/browse/ARROW-6688 Project: Apache Arrow

[jira] [Created] (ARROW-6687) [Rust] [DataFusion]

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6687: - Summary: [Rust] [DataFusion] Key: ARROW-6687 URL: https://issues.apache.org/jira/browse/ARROW-6687 Project: Apache Arrow Issue Type: Bug Components: Rust

[jira] [Created] (ARROW-6686) [CI] Pull and push docker images to speed up the nightly builds

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6686: -- Summary: [CI] Pull and push docker images to speed up the nightly builds Key: ARROW-6686 URL: https://issues.apache.org/jira/browse/ARROW-6686 Project: Apache Ar

[jira] [Created] (ARROW-6685) [C++/Python] S3 FileStat object's base_path and type depends on trailing slash

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6685: -- Summary: [C++/Python] S3 FileStat object's base_path and type depends on trailing slash Key: ARROW-6685 URL: https://issues.apache.org/jira/browse/ARROW-6685 Proj

[jira] [Created] (ARROW-6684) [C++/Python] S3FileSystem.create_dir should raise for a nested directory with recursive keyword set to False

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6684: -- Summary: [C++/Python] S3FileSystem.create_dir should raise for a nested directory with recursive keyword set to False Key: ARROW-6684 URL: https://issues.apache.org/jira/brows