Re: [Discuss] Do a 0.15.0 release before 1.0.0?

2019-07-22 Thread Micah Kornfield
I think the main reason to do a release before 1.0.0 is if we want to make the change that would give a good error message for forward incompatibility (I think this could be done as 0.14.2 since it would just be clarifying an error message). Otherwise, I think including it in 1.0.0 would be fine (

Re: [Memo] API Behavior changes

2019-07-22 Thread Fan Liya
@Wes Mckineey, Thanks for the good suggestion. Best, Liya Fan On Mon, Jul 22, 2019 at 8:23 PM Wes McKinney wrote: > You could also use labels in JIRA to mark issues that introduce API changes > > On Mon, Jul 22, 2019 at 4:42 AM Fan Liya wrote: > > > > @Uwe L. Korn > > > > Thanks a lot for the

Re: [DISCUSS][JAVA] Designs & goals for readers/writers

2019-07-22 Thread Micah Kornfield
Hi Wes, Are there currently files that need to be moved? Thanks, Micah On Monday, July 22, 2019, Wes McKinney wrote: > Sort of tangentially related, but while we are on the topic: > > Please, if you would, avoid checking binary test data files into the > main repository. Use https://github.com/

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Jacques Nadeau
There are two main things that have been important to us in Dremio around threading: Separate threading model from algorithms. We chose to do parallelization at the engine level instead of the operation level. This allows us to substantially increase parallelization while still maintaining a stron

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Krisztián Szűcs
On Tue, Jul 23, 2019 at 12:31 AM Krisztián Szűcs wrote: > The remaining tasks are: > - Updating website (after https://github.com/apache/arrow/pull/4922 is > merged) > I'm generating the apidocs and updating the changelog. I can send the ANNOUNCEMENT once the site gets updated. > - Update JavaSc

[jira] [Created] (ARROW-6010) [Release] JAVA_HOME is inproperly set in the gen apidocs docker container

2019-07-22 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-6010: -- Summary: [Release] JAVA_HOME is inproperly set in the gen apidocs docker container Key: ARROW-6010 URL: https://issues.apache.org/jira/browse/ARROW-6010 Project:

[jira] [Created] (ARROW-6009) [Release][JS] Ignore NPM errors in the javascript release script

2019-07-22 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-6009: -- Summary: [Release][JS] Ignore NPM errors in the javascript release script Key: ARROW-6009 URL: https://issues.apache.org/jira/browse/ARROW-6009 Project: Apache Ar

[jira] [Created] (ARROW-6008) [Release] Don't parallelize the bintray upload script

2019-07-22 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-6008: -- Summary: [Release] Don't parallelize the bintray upload script Key: ARROW-6008 URL: https://issues.apache.org/jira/browse/ARROW-6008 Project: Apache Arrow

[jira] [Created] (ARROW-6007) [Release] Use SNAPSHOT versions in pom.xml files after release

2019-07-22 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-6007: -- Summary: [Release] Use SNAPSHOT versions in pom.xml files after release Key: ARROW-6007 URL: https://issues.apache.org/jira/browse/ARROW-6007 Project: Apache Arro

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Krisztián Szűcs
The remaining tasks are: - Updating website (after https://github.com/apache/arrow/pull/4922 is merged) - Update JavaScript packages - Update R packages On Mon, Jul 22, 2019 at 9:52 PM Krisztián Szűcs wrote: > Added a warning about that. > > On Mon, Jul 22, 2019 at 9:38 PM Wes McKinney wrote: >

[jira] [Created] (ARROW-6006) [C++] Error reading an empty IPC stream with a dictionary-encoded column

2019-07-22 Thread Steven Fackler (JIRA)
Steven Fackler created ARROW-6006: - Summary: [C++] Error reading an empty IPC stream with a dictionary-encoded column Key: ARROW-6006 URL: https://issues.apache.org/jira/browse/ARROW-6006 Project: Apa

[jira] [Created] (ARROW-6005) arrow::FileReader::GetRecordBatchReader() does not behave as documented since ARROW-1012

2019-07-22 Thread Martin (JIRA)
Martin created ARROW-6005: - Summary: arrow::FileReader::GetRecordBatchReader() does not behave as documented since ARROW-1012 Key: ARROW-6005 URL: https://issues.apache.org/jira/browse/ARROW-6005 Project: Apa

[jira] [Created] (ARROW-6004) [C++] CSV reader ignore_empty_lines option doesn't handle empty lines

2019-07-22 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-6004: -- Summary: [C++] CSV reader ignore_empty_lines option doesn't handle empty lines Key: ARROW-6004 URL: https://issues.apache.org/jira/browse/ARROW-6004 Project: Apac

[jira] [Created] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader

2019-07-22 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-6003: -- Summary: [C++] Better input validation and error messaging in CSV reader Key: ARROW-6003 URL: https://issues.apache.org/jira/browse/ARROW-6003 Project: Apache Arr

[jira] [Created] (ARROW-6002) [C++][Gandiva] TestCastFunctions does not test int64 casting`

2019-07-22 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-6002: Summary: [C++][Gandiva] TestCastFunctions does not test int64 casting` Key: ARROW-6002 URL: https://issues.apache.org/jira/browse/ARROW-6002 Project: Apache A

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Krisztián Szűcs
Added a warning about that. On Mon, Jul 22, 2019 at 9:38 PM Wes McKinney wrote: > hi folks -- we had a small snafu with the post-release tasks because > this patch release did not follow our normal release procedure where > the release candidate is usually based off of master. > > When we prepar

Re: [DISCUSS][JAVA] Designs & goals for readers/writers

2019-07-22 Thread Wes McKinney
Sort of tangentially related, but while we are on the topic: Please, if you would, avoid checking binary test data files into the main repository. Use https://github.com/apache/arrow-testing if you truly need to check in binary data -- something to look out for in code reviews On Mon, Jul 22, 201

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Wes McKinney
hi folks -- we had a small snafu with the post-release tasks because this patch release did not follow our normal release procedure where the release candidate is usually based off of master. When we prepare a patch release that is based on backported commits into a maintenance branch, we DO NOT n

Re: [Discuss] Do a 0.15.0 release before 1.0.0?

2019-07-22 Thread Wes McKinney
I'd be satisfied with fixing the Flatbuffer alignment issue either in a 0.15.0 or 1.0.0. In the interest of expediency, though, making a 0.15.0 with this change sooner rather than later might be prudent. On Mon, Jul 22, 2019 at 12:35 PM Antoine Pitrou wrote: > > > Hello, > > Recently we've discus

Re: [DISCUSS] [Gandiva] Adding query plan to Gandiva protobuf definition

2019-07-22 Thread Andy Grove
Thanks, Jacques and Wes. I agree that this needs discussion and a design document. I have put together this Google doc to get the ball rolling: https://docs.google.com/document/d/1Uv1FmPs7uYMLoJUH1EF0oxm-ujtz1h1tJFl0zN60TIg/edit?usp=sharing Thanks, Andy. On Mon, Jul 22, 2019 at 6:39 AM Wes McK

[jira] [Created] (ARROW-6001) Add from_pydict(), from_pylist() and to_pylist() to pyarrow.Table + improve pandas.to_dict()

2019-07-22 Thread David Lee (JIRA)
David Lee created ARROW-6001: Summary: Add from_pydict(), from_pylist() and to_pylist() to pyarrow.Table + improve pandas.to_dict() Key: ARROW-6001 URL: https://issues.apache.org/jira/browse/ARROW-6001 Pr

[jira] [Created] (ARROW-6000) [Python] Expose LargeBinaryType and LargeStringType

2019-07-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-6000: - Summary: [Python] Expose LargeBinaryType and LargeStringType Key: ARROW-6000 URL: https://issues.apache.org/jira/browse/ARROW-6000 Project: Apache Arrow Is

[Discuss] Do a 0.15.0 release before 1.0.0?

2019-07-22 Thread Antoine Pitrou
Hello, Recently we've discussed breaking the IPC format to fix a long-standing alignment issue. See this discussion: https://lists.apache.org/thread.html/8cea56f2069710ac128ff9129c744f0ef96a3e33a4d79d7e820019af@%3Cdev.arrow.apache.org%3E Should we first do a 0.15.0 in order to get those format

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Antoine Pitrou
Le 22/07/2019 à 18:52, Wes McKinney a écrit : > > Probably the way is to introduce async-capable read APIs into the file > interfaces. For example: > > file->ReadAsyncBlock(thread_ctx, ...); > > That way the file implementation can decide whether asynchronous logic > is actually needed. > I do

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Wes McKinney
On Mon, Jul 22, 2019 at 11:42 AM Antoine Pitrou wrote: > > On Mon, 22 Jul 2019 11:07:43 -0500 > Wes McKinney wrote: > > > > Right, which is why I'm suggesting a simple model to allow threads > > that are waiting on IO to allow other threads to execute. > > If you are doing memory-mapped IO, how d

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-22 Thread Keith Kraus
We're working on that now, will report back once we have something more concrete to act on. Thanks! -Keith On 7/22/19, 12:51 PM, "Antoine Pitrou" wrote: Hi Keith, Can you try to further reduce the reduce your reproducer until you find the offending construct? Re

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-22 Thread Antoine Pitrou
Hi Keith, Can you try to further reduce the reduce your reproducer until you find the offending construct? Regards Antoine. Le 22/07/2019 à 18:46, Keith Kraus a écrit : > I temporarily removed the csr related code that has the namespace clash and > confirmed that the same compilation warnin

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-22 Thread Keith Kraus
I temporarily removed the csr related code that has the namespace clash and confirmed that the same compilation warnings and errors still occur. On 7/20/19, 1:03 AM, "Micah Kornfield" wrote: The namespace collision is a definite possibility, especially if you are using g++ which seems

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Antoine Pitrou
On Mon, 22 Jul 2019 11:07:43 -0500 Wes McKinney wrote: > > Right, which is why I'm suggesting a simple model to allow threads > that are waiting on IO to allow other threads to execute. If you are doing memory-mapped IO, how do you plan to tell whether and when you'll be going to wait for IO? R

[jira] [Created] (ARROW-5999) [C++] Required header files missing when built with -DARROW_DATASET=OFF

2019-07-22 Thread Steven Fackler (JIRA)
Steven Fackler created ARROW-5999: - Summary: [C++] Required header files missing when built with -DARROW_DATASET=OFF Key: ARROW-5999 URL: https://issues.apache.org/jira/browse/ARROW-5999 Project: Apac

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Wes McKinney
On Mon, Jul 22, 2019 at 10:49 AM Antoine Pitrou wrote: > > > Le 18/07/2019 à 00:25, Wes McKinney a écrit : > > > > * We look forward in the stream until we find a complete Thrift data > > page header. This may trigger 0 or more (possibly multiple) Read calls > > to the underlying "file" handle. In

Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

2019-07-22 Thread Antoine Pitrou
On Mon, 22 Jul 2019 08:40:08 -0700 Brian Hulette wrote: > To me, the most important aspect of this proposal is the addition of sparse > encodings, and I'm curious if there are any more objections to that > specifically. So far I believe the only one is that it will make > computation libraries mor

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-22 Thread Antoine Pitrou
Le 18/07/2019 à 00:25, Wes McKinney a écrit : > > * We look forward in the stream until we find a complete Thrift data > page header. This may trigger 0 or more (possibly multiple) Read calls > to the underlying "file" handle. In the default case, the data is all > actually in memory so the read

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Krisztián Szűcs
Hi, The 0.14.1 RC0 vote carries with 4 binding +1 (and 1 non-binding +1) votes. Thanks for helping verify the RC! I'm moving on to the post-release tasks [1] once github resolves its partially degraded service issues [2]. Any help is appreciated. - Krisztian [1]: https://cwiki.apache.org/conflue

Re: [DISCUSS] Format additions for encoding/compression (Was: [Discuss] Format additions to Arrow for sparse data and data integrity)

2019-07-22 Thread Brian Hulette
To me, the most important aspect of this proposal is the addition of sparse encodings, and I'm curious if there are any more objections to that specifically. So far I believe the only one is that it will make computation libraries more complicated. This is absolutely true, but I think it's worth th

Re: [DISCUSS][JAVA] Designs & goals for readers/writers

2019-07-22 Thread Micah Kornfield
Hi Jacques, Thanks for the clarifications. I think the distinction is useful. If people want to write adapters for Arrow, I see that as useful but very > different than writing native implementations and we should try to create a > clear delineation between the two. What do you think about creat

Re: [VOTE] Release Apache Arrow 0.14.1 - RC0

2019-07-22 Thread Krisztián Szűcs
+1 (binding) Ran both the source and binary verification scripts on macOS Mojave. Also tested the wheels in python docker containers and on OSX. On Thu, Jul 18, 2019 at 11:48 PM Sutou Kouhei wrote: > +1 (binding) > > I ran the followings on Debian GNU/Linux sid: > > * TEST_CSHARP=0 JAVA_HOME=

Re: [DISCUSS] [Gandiva] Adding query plan to Gandiva protobuf definition

2019-07-22 Thread Wes McKinney
I agree that I'd also like to see a design / goals document so clarify the scope (and the non-goals, too) In general, I would hesitate to add anything higher level to the Gandiva protos -- there is already confusion from people who believe that Gandiva is a "query engine" where it is actually a qu

Re: [Memo] API Behavior changes

2019-07-22 Thread Wes McKinney
You could also use labels in JIRA to mark issues that introduce API changes On Mon, Jul 22, 2019 at 4:42 AM Fan Liya wrote: > > @Uwe L. Korn > > Thanks a lot for the good suggestion. > I will create a new file to track the changes. > > Best, > Liya Fan > > On Mon, Jul 22, 2019 at 5:03 PM Uwe L. K

[jira] [Created] (ARROW-5998) [Java] Open a document to track the API changes

2019-07-22 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5998: --- Summary: [Java] Open a document to track the API changes Key: ARROW-5998 URL: https://issues.apache.org/jira/browse/ARROW-5998 Project: Apache Arrow Issue Type: Improv

Re: [Memo] API Behavior changes

2019-07-22 Thread Fan Liya
@Uwe L. Korn Thanks a lot for the good suggestion. I will create a new file to track the changes. Best, Liya Fan On Mon, Jul 22, 2019 at 5:03 PM Uwe L. Korn wrote: > Hallo Liya, > > what about having this as part of the repository, e.g. > java/api-changes.md? We have an auto-generated changelo

Re: [Memo] API Behavior changes

2019-07-22 Thread Uwe L. Korn
Hallo Liya, what about having this as part of the repository, e.g. java/api-changes.md? We have an auto-generated changelog that is quite verbose but having such documentation for consumers of the Java library would be really helpful as it is gives a denser packed information on upgrading versi

[jira] [Created] (ARROW-5997) [Java] Support dictionary encoding for Union type

2019-07-22 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5997: - Summary: [Java] Support dictionary encoding for Union type Key: ARROW-5997 URL: https://issues.apache.org/jira/browse/ARROW-5997 Project: Apache Arrow Issue Type: New Feat