[jira] [Created] (ARROW-8158) Getting length of data buffer and base variable width vector

2020-03-18 Thread Gaurangi Saxena (Jira)
Gaurangi Saxena created ARROW-8158: -- Summary: Getting length of data buffer and base variable width vector Key: ARROW-8158 URL: https://issues.apache.org/jira/browse/ARROW-8158 Project: Apache Arrow

Re: Arrow sync call

2020-03-18 Thread Neal Richardson
Attendees: Ben Kietzman Uwe Korn David Li Rok Mihevc Neal Richardson François Saint Jacques Krisztián Szucs Discussion: * C++ Datasets: file inspection not blocking scanning * 0.17 timing, remaining important/blocking issues * 1.0 reminder: pick integration tests back up after 0.17 * Discussion of

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-18-1

2020-03-18 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-18-1 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-18-1 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-18-1-azure-conda-osx-clang-py36 - cond

[jira] [Created] (ARROW-8157) [C++] Upgrade to LLVM 9

2020-03-18 Thread Jun NAITOH (Jira)
Jun NAITOH created ARROW-8157: - Summary: [C++] Upgrade to LLVM 9 Key: ARROW-8157 URL: https://issues.apache.org/jira/browse/ARROW-8157 Project: Apache Arrow Issue Type: Improvement Comp

[jira] [Created] (ARROW-8156) [C++] Add variant of Filesystem::OpenInputFile that has memory-map like behavior if it is possible

2020-03-18 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8156: --- Summary: [C++] Add variant of Filesystem::OpenInputFile that has memory-map like behavior if it is possible Key: ARROW-8156 URL: https://issues.apache.org/jira/browse/ARROW-8156

[jira] [Created] (ARROW-8155) [C++] Add "ON only if system dependencies available" build mode for certain optional Arrow components

2020-03-18 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8155: --- Summary: [C++] Add "ON only if system dependencies available" build mode for certain optional Arrow components Key: ARROW-8155 URL: https://issues.apache.org/jira/browse/ARROW-8155

[jira] [Created] (ARROW-8154) HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release

2020-03-18 Thread Eric Henry (Jira)
Eric Henry created ARROW-8154: - Summary: HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release Key: ARROW-8154 URL: https://issues.apache.org/jira/browse/ARROW-8154 Project: Apach

[jira] [Created] (ARROW-8153) [Packaging] Update the conda feedstock files and upload artifacts to Anaconda

2020-03-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8153: -- Summary: [Packaging] Update the conda feedstock files and upload artifacts to Anaconda Key: ARROW-8153 URL: https://issues.apache.org/jira/browse/ARROW-8153 Proje

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread David Li
For us it applies to S3-like systems, not only S3 itself, at least. It does make sense to limit it to some filesystems. The behavior would be opt-in at the Parquet reader level, so at the Datasets or Filesystem layer we can take care of enabling the flag for filesystems where it actually helps. I

[jira] [Created] (ARROW-8152) [C++] IO: split large coalesced reads into smaller ones

2020-03-18 Thread David Li (Jira)
David Li created ARROW-8152: --- Summary: [C++] IO: split large coalesced reads into smaller ones Key: ARROW-8152 URL: https://issues.apache.org/jira/browse/ARROW-8152 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-8151) [Benchmarking][Dataset] Benchmark Parquet read performance with S3File

2020-03-18 Thread David Li (Jira)
David Li created ARROW-8151: --- Summary: [Benchmarking][Dataset] Benchmark Parquet read performance with S3File Key: ARROW-8151 URL: https://issues.apache.org/jira/browse/ARROW-8151 Project: Apache Arrow

[jira] [Created] (ARROW-8150) [Rust] Allow writing custom FileMetaData k/v pairs

2020-03-18 Thread David Kegley (Jira)
David Kegley created ARROW-8150: --- Summary: [Rust] Allow writing custom FileMetaData k/v pairs Key: ARROW-8150 URL: https://issues.apache.org/jira/browse/ARROW-8150 Project: Apache Arrow Issue T

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Antoine Pitrou
Le 18/03/2020 à 18:30, David Li a écrit : >> Instead of S3, you can use the Slow streams and Slow filesystem >> implementations. It may better protect against varying external conditions. > > I think we'd want several different benchmarks - we want to ensure we > don't regress local filesystem

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread David Li
> Instead of S3, you can use the Slow streams and Slow filesystem > implementations. It may better protect against varying external conditions. I think we'd want several different benchmarks - we want to ensure we don't regress local filesystem performance, and we also want to measure in an actu

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Wes McKinney
On Wed, Mar 18, 2020 at 11:42 AM Antoine Pitrou wrote: > > > Le 18/03/2020 à 17:36, David Li a écrit : > > Hi all, > > > > Thanks to Antoine for implementing the core read coalescing logic. > > > > We've taken a look at what else needs to be done to get this working, > > and it sounds like the fol

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Wes McKinney
hi David, Yes, this sounds right to me. I would say that we should come up with the public API for column prebuffering ASAP and then get to work on implementing it and working to maximize the throughput. - Wes On Wed, Mar 18, 2020 at 11:37 AM David Li wrote: > > Hi all, > > Thanks to Antoine fo

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Antoine Pitrou
Le 18/03/2020 à 17:36, David Li a écrit : > Hi all, > > Thanks to Antoine for implementing the core read coalescing logic. > > We've taken a look at what else needs to be done to get this working, > and it sounds like the following changes would be worthwhile, > independent of the rest of the o

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread David Li
Hi all, Thanks to Antoine for implementing the core read coalescing logic. We've taken a look at what else needs to be done to get this working, and it sounds like the following changes would be worthwhile, independent of the rest of the optimizations we discussed: - Add benchmarks of the curren

[jira] [Created] (ARROW-8149) [C++/Python] Enable CUDA Support in conda recipes

2020-03-18 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8149: --- Summary: [C++/Python] Enable CUDA Support in conda recipes Key: ARROW-8149 URL: https://issues.apache.org/jira/browse/ARROW-8149 Project: Apache Arrow Issue Type: New

Arrow sync call

2020-03-18 Thread Neal Richardson
We're meeting on https://meet.google.com/paf-zymr-whn today. Apologies for the late reminder. Neal

[jira] [Created] (ARROW-8148) [Packaging][C++] Add google-cloud-cpp to conda-forge

2020-03-18 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8148: --- Summary: [Packaging][C++] Add google-cloud-cpp to conda-forge Key: ARROW-8148 URL: https://issues.apache.org/jira/browse/ARROW-8148 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8147) [Packaging] Add google-cloud-cpp to ThirdpartyToolchain

2020-03-18 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8147: --- Summary: [Packaging] Add google-cloud-cpp to ThirdpartyToolchain Key: ARROW-8147 URL: https://issues.apache.org/jira/browse/ARROW-8147 Project: Apache Arrow Is

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2020-03-18 Thread David Li
Following up here, I've submitted a draft implementation for C++: https://github.com/apache/arrow/pull/6656 The core functionality is there, but there are still holes that I need to implement. Compared to the draft spec, the client also sends a FlightDescriptor to begin with, though it's currently

[jira] [Created] (ARROW-8146) [C++] Add per-filesystem facility to sanitize a path

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8146: - Summary: [C++] Add per-filesystem facility to sanitize a path Key: ARROW-8146 URL: https://issues.apache.org/jira/browse/ARROW-8146 Project: Apache Arrow I

[jira] [Created] (ARROW-8145) [C++] Rename GetTargetInfos

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8145: - Summary: [C++] Rename GetTargetInfos Key: ARROW-8145 URL: https://issues.apache.org/jira/browse/ARROW-8145 Project: Apache Arrow Issue Type: Wish

[jira] [Created] (ARROW-8144) [CI] Cmake 3.2 nightly builds fails

2020-03-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8144: -- Summary: [CI] Cmake 3.2 nightly builds fails Key: ARROW-8144 URL: https://issues.apache.org/jira/browse/ARROW-8144 Project: Apache Arrow Issue Type: Bug

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-18-0

2020-03-18 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-18-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-18-0 Failed Tasks: - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-18-0-azure-conda-win-vs2015-py36 - co

[jira] [Created] (ARROW-8143) [C++] Provide a default implementation for ExtensionType::ExtensionEquals

2020-03-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8143: -- Summary: [C++] Provide a default implementation for ExtensionType::ExtensionEquals Key: ARROW-8143 URL: https://issues.apache.org/jira/browse/ARROW-8143 Project:

[jira] [Created] (ARROW-8142) [Python/C++] Casting empty table from after parquet roundtrip causes critical failure

2020-03-18 Thread Florian Jetter (Jira)
Florian Jetter created ARROW-8142: - Summary: [Python/C++] Casting empty table from after parquet roundtrip causes critical failure Key: ARROW-8142 URL: https://issues.apache.org/jira/browse/ARROW-8142