Re: some questions, please help

2019-11-07 Thread Micah Kornfield
> > I wonder how arrow deals with gaps among different implementations? Say, > C++ lib implements some features go lib doesn't support. Is there a > consistent API document, or documents for each language implementation? It is important to distinguish between two types of functionality: 1. Suppo

Re: some questions, please help

2019-11-07 Thread Yibo Cai
Hi Wes, On 10/30/19 10:24 PM, Wes McKinney wrote: hi Yibo On Wed, Oct 30, 2019 at 2:16 AM Yibo Cai wrote: Hi, I'm new to Arrow. Would like to seek for help about some questions. Any comment is welcomed. - About source code tree, my understand is that "cpp" is the core arrow libraries, "c

[jira] [Created] (ARROW-7099) [C++] Disambiguate function calls in csv parser test

2019-11-07 Thread Prudhvi Porandla (Jira)
Prudhvi Porandla created ARROW-7099: --- Summary: [C++] Disambiguate function calls in csv parser test Key: ARROW-7099 URL: https://issues.apache.org/jira/browse/ARROW-7099 Project: Apache Arrow

[CVE-2019-12408][CVE-2019-12410] Uninitialized Memory Vulnerabilities fixed in Apache Arrow 0.15.1

2019-11-07 Thread Micah Kornfield
The Apache Arrow project would like to hereby disclose that our 0.15.1 release patches two uninitialized bugs (CVE-2019-12408 and CVE-2019-12410) in the the C++ implementation (which in turn can affect, Python, Ruby and R). In both cases there is a potential vulnerability where data in memory can

[jira] [Created] (ARROW-7098) [Java] Improve the performance of comparing two memory blocks

2019-11-07 Thread Liya Fan (Jira)
Liya Fan created ARROW-7098: --- Summary: [Java] Improve the performance of comparing two memory blocks Key: ARROW-7098 URL: https://issues.apache.org/jira/browse/ARROW-7098 Project: Apache Arrow Iss

[jira] [Created] (ARROW-7097) [Rust][CI] Builds failing due to rust nightly

2019-11-07 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7097: --- Summary: [Rust][CI] Builds failing due to rust nightly Key: ARROW-7097 URL: https://issues.apache.org/jira/browse/ARROW-7097 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7096) [C++] Add options structs for concatenation-with-promotion and schema unification

2019-11-07 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7096: --- Summary: [C++] Add options structs for concatenation-with-promotion and schema unification Key: ARROW-7096 URL: https://issues.apache.org/jira/browse/ARROW-7096 Project

Re: [DISCUSS] C data interface updated

2019-11-07 Thread Wes McKinney
Thanks Antoine. Do we need to hold a vote to adopt the C API? I will see if I can review the C++ implementation prior to leaving for ~1 week of vacation next Tuesday On Thu, Nov 7, 2019 at 1:24 PM Antoine Pitrou wrote: > > > Hello, > > The C data interface spec was updated following a suggestion

Merged C++ Parquet Encryption implementation PARQUET-1300

2019-11-07 Thread Wes McKinney
hi folks, I recently merged https://github.com/apache/arrow/pull/4826 containing the bulk of the Parquet C++ encrypted file implementation: https://github.com/apache/arrow/commit/41753ace481a82dea651c54639ec4adbae169187 This patch has been in progress for over a year with numerous rounds of revi

[Java] Call for reviewers

2019-11-07 Thread Micah Kornfield
There are a few open PRs that I think could either use a first or second set of eyes: https://github.com/apache/arrow/pull/5630 https://github.com/apache/arrow/pull/5645 https://github.com/apache/arrow/pull/5751 Would some committers be willing to take a look? Thanks, Micah

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-07 Thread Micah Kornfield
I think the main sticking point was dictionaries in the file format. It seems like the use-case for delta dictionaries might be limited so I didn't feel strongly about it. Antoine, did you have more thoughts on this? Thanks, Micah On Wed, Nov 6, 2019 at 9:24 AM Wes McKinney wrote: > Just bum

[jira] [Created] (ARROW-7095) [R] Better handling of unsupported filter expression in dplyr methods

2019-11-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7095: -- Summary: [R] Better handling of unsupported filter expression in dplyr methods Key: ARROW-7095 URL: https://issues.apache.org/jira/browse/ARROW-7095 Project: Apac

[jira] [Created] (ARROW-7094) [R] Change FileSystem access in Datasets to shared_ptr

2019-11-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7094: -- Summary: [R] Change FileSystem access in Datasets to shared_ptr Key: ARROW-7094 URL: https://issues.apache.org/jira/browse/ARROW-7094 Project: Apache Arrow

[jira] [Created] (ARROW-7093) [R] Support creating ScalarExpressions for more data types

2019-11-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7093: -- Summary: [R] Support creating ScalarExpressions for more data types Key: ARROW-7093 URL: https://issues.apache.org/jira/browse/ARROW-7093 Project: Apache Arrow

[jira] [Created] (ARROW-7092) [R] Add vignette for dplyr and datasets

2019-11-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7092: -- Summary: [R] Add vignette for dplyr and datasets Key: ARROW-7092 URL: https://issues.apache.org/jira/browse/ARROW-7092 Project: Apache Arrow Issue Type:

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-11-07-0

2019-11-07 Thread Sutou Kouhei
Hi, > - centos-7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-07-0-azure-centos-7 I've opened an issue for this: https://issues.apache.org/jira/browse/PARQUET-1688 It's caused by https://github.com/apache/arrow/pull/5699 . It seems that g++ 4.8.5 doesn't

[DISCUSS] C data interface updated

2019-11-07 Thread Antoine Pitrou
Hello, The C data interface spec was updated following a suggestion by Wes and Uwe: https://github.com/apache/arrow/blob/3173f88dfa32ce3296a121b032f351e089888601/docs/source/format/CDataInterface.rst The metadata encoding was changed. It does not use JSON anymore but a very simple binary encod

[jira] [Created] (ARROW-7091) [C++] Move all factories to type_fwd.h

2019-11-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7091: - Summary: [C++] Move all factories to type_fwd.h Key: ARROW-7091 URL: https://issues.apache.org/jira/browse/ARROW-7091 Project: Apache Arrow Issue Type: Imp

[jira] [Created] (ARROW-7090) [C++] AssertFieldEqual (and friends) doesn't show metadata on failure

2019-11-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7090: - Summary: [C++] AssertFieldEqual (and friends) doesn't show metadata on failure Key: ARROW-7090 URL: https://issues.apache.org/jira/browse/ARROW-7090 Project: Apache

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-07 Thread David Li
I've been extremely backlogged, I will update the proposal when I get a chance and reply here when done. Best, David On 11/7/19, Wes McKinney wrote: > Bumping this discussion since a couple of weeks have passed. It seems > there are still some questions here, could we summarize what are the > al

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-07 Thread Wes McKinney
Bumping this discussion since a couple of weeks have passed. It seems there are still some questions here, could we summarize what are the alternatives along with any public API implications so we can try to render a decision? On Sat, Oct 26, 2019 at 7:19 PM David Li wrote: > > Hi Wes, > > Respon

Re: Apache Arrow build with needed dependencies only

2019-11-07 Thread Wes McKinney
I just opened https://issues.apache.org/jira/browse/ARROW-7089 about increasing transparency around what options are causing thirdparty dependencies to be required On Thu, Nov 7, 2019 at 10:05 AM Wes McKinney wrote: > > hi Richard, > > On Thu, Nov 7, 2019 at 9:59 AM Richard Bachmann > wrote: > >

[jira] [Created] (ARROW-7089) [C++] In CMake output, list each enabled thirdparty toolchain dependency and the reason for its being enabled

2019-11-07 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7089: --- Summary: [C++] In CMake output, list each enabled thirdparty toolchain dependency and the reason for its being enabled Key: ARROW-7089 URL: https://issues.apache.org/jira/browse/ARR

Re: Apache Arrow build with needed dependencies only

2019-11-07 Thread Wes McKinney
hi Richard, On Thu, Nov 7, 2019 at 9:59 AM Richard Bachmann wrote: > > Hello, > I'm contacting you on behalf of the LCG Releases team at CERN. We > provide a common software stack for LHCb, ATLAS and others to be used at > CERN and the worldwide computing grid. > > Right now we're looking into op

Apache Arrow build with needed dependencies only

2019-11-07 Thread Richard Bachmann
Hello, I'm contacting you on behalf of the LCG Releases team at CERN. We provide a common software stack for LHCb, ATLAS and others to be used at CERN and the worldwide computing grid. Right now we're looking into optimizing the way we're building Apache Arrow (C++ & Python) and its dependenc

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-11-07-0

2019-11-07 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-7088 about the legitimate wheel build failures. Haven't looked at the others yet On Thu, Nov 7, 2019 at 7:01 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2019-11-07-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branch

[jira] [Created] (ARROW-7088) [C++][Python] gcc 4.8 / wheel builds failing after PARQUET-1678

2019-11-07 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7088: --- Summary: [C++][Python] gcc 4.8 / wheel builds failing after PARQUET-1678 Key: ARROW-7088 URL: https://issues.apache.org/jira/browse/ARROW-7088 Project: Apache Arrow

[jira] [Created] (ARROW-7087) [Pyarrow] Table Metadata disappear when we write a partitioned dataset

2019-11-07 Thread Jira
François Blanchard created ARROW-7087: - Summary: [Pyarrow] Table Metadata disappear when we write a partitioned dataset Key: ARROW-7087 URL: https://issues.apache.org/jira/browse/ARROW-7087 Projec

[jira] [Created] (ARROW-7086) [C++] Provide a wrapper for invoking factories to produce a Result

2019-11-07 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7086: --- Summary: [C++] Provide a wrapper for invoking factories to produce a Result Key: ARROW-7086 URL: https://issues.apache.org/jira/browse/ARROW-7086 Project: Apache Arrow

[jira] [Created] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader

2019-11-07 Thread Artem Alekseev (Jira)
Artem Alekseev created ARROW-7085: - Summary: [C++][CSV] Add support for Extention type in csv reader Key: ARROW-7085 URL: https://issues.apache.org/jira/browse/ARROW-7085 Project: Apache Arrow

[NIGHTLY] Arrow Build Report for Job nightly-2019-11-07-0

2019-11-07 Thread Crossbow
Arrow Build Report for Job nightly-2019-11-07-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-07-0 Failed Tasks: - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-07-0-azure-centos-7 - conda-linux-gcc-py37: URL:

[jira] [Created] (ARROW-7084) [C++] ArrayRangeEquals should check for full type equality?

2019-11-07 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7084: -- Summary: [C++] ArrayRangeEquals should check for full type equality? Key: ARROW-7084 URL: https://issues.apache.org/jira/browse/ARROW-7084 Project: Apache Arrow

Re: [Java] Append multiple record batches together?

2019-11-07 Thread Fan Liya
Hi Micah, Thanks for bringing this up. > 1. An efficient solution already exists? It seems like TransferPair implementations could possibly be improved upon or have they already been optimized? Fundamnentally, memory copy is unavoidable, IMO, because the source and targe memory regions are like