Re: [DISCUSS] PR Backlog reduction

2019-05-21 Thread Micah Kornfield
I agree on hand curation for now. I'll try to setup a sign up spreadsheet for shepherding old PRs and once that done assign reviewers/ping old PRs. I expect to have something to share by the weekend. On Tuesday, May 21, 2019, Wes McKinney wrote: > I think maintainers or contributors should be

Re: Should EOS be mandatory for IPC File format?

2019-05-21 Thread Micah Kornfield
This seems like a reasonable change. Is there any reason that we shouldnt always append EOS? On Tuesday, May 21, 2019, John Muehlhausen wrote: > Wes, > > Check out reader.cpp. It seg faults when it gets to the next > message-that-is-not-a-message... it is a footer. But I have no way to > know

Re: Should EOS be mandatory for IPC File format?

2019-05-21 Thread John Muehlhausen
Wes, Check out reader.cpp. It seg faults when it gets to the next message-that-is-not-a-message... it is a footer. But I have no way to know this in reader.cpp because I'm piping the File in via stdin. In seeker.cpp I seek to the end and figure out where the footer is (this is a py-arrow-writte

[jira] [Created] (ARROW-5392) [C++][CI][MinGW] Disable static library build on AppVeyor

2019-05-21 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5392: --- Summary: [C++][CI][MinGW] Disable static library build on AppVeyor Key: ARROW-5392 URL: https://issues.apache.org/jira/browse/ARROW-5392 Project: Apache Arrow

Re: Should EOS be mandatory for IPC File format?

2019-05-21 Thread Wes McKinney
hi John, I'm not sure I follow. The EOS you're referring to is part of the streaming format. It's designed to be readable using an InputStream interface that does not support seeking at all. You can see the core logic where messages are popped off the InputStream here https://github.com/apache/ar

Should EOS be mandatory for IPC File format?

2019-05-21 Thread John Muehlhausen
https://arrow.apache.org/docs/format/IPC.html#file-format If this stream marker is optional in the file format, doesn't this prevent someone from reading the file without being able to seek() it, e.g. if it is "piped in" to a program? Or otherwise they'll have to stream in the entire thing befo

[jira] [Created] (ARROW-5391) [Format] Move "Buffer" from Schema.fbs to Message.fbs?

2019-05-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5391: --- Summary: [Format] Move "Buffer" from Schema.fbs to Message.fbs? Key: ARROW-5391 URL: https://issues.apache.org/jira/browse/ARROW-5391 Project: Apache Arrow Iss

Re: [DISCUSS] PR Backlog reduction

2019-05-21 Thread Wes McKinney
I think maintainers or contributors should be responsible for closing PRs, it also helps with backlog curation (sometimes when a stale PR is closed the JIRA may also be closed if it's a Won't Fix) On Tue, May 21, 2019 at 1:12 PM Antoine Pitrou wrote: > > > > Le 21/05/2019 à 20:02, Neal Richardson

Re: [Discuss][Format] Zero size record batches

2019-05-21 Thread Wes McKinney
https://github.com/apache/arrow/pull/3871 On Tue, May 21, 2019 at 1:26 PM Paul Taylor wrote: > > I'd be happy to PR a fix for JS today if someone can link me to Wes's PR. > > On 5/21/19 11:02 AM, Wes McKinney wrote: > > I agree also. As a practical use case, the results of a request made > > with

Re: [Discuss][Format] Zero size record batches

2019-05-21 Thread Paul Taylor
I'd be happy to PR a fix for JS today if someone can link me to Wes's PR. On 5/21/19 11:02 AM, Wes McKinney wrote: I agree also. As a practical use case, the results of a request made with Arrow Flight might yield an empty result set. I'm not sure if this needs to be formally noted in the specif

Re: [DISCUSS] PR Backlog reduction

2019-05-21 Thread Antoine Pitrou
Le 21/05/2019 à 20:02, Neal Richardson a écrit : > Automatically close stale PRs? https://github.com/probot/stale That doesn't sound like a good idea to me. Regards Antoine.

Re: [DISCUSS] PR Backlog reduction

2019-05-21 Thread Neal Richardson
Automatically close stale PRs? https://github.com/probot/stale On Tue, May 21, 2019 at 11:00 AM Wes McKinney wrote: > Any other thoughts about process to manage the backlog? > > On Thu, May 16, 2019 at 2:58 PM Wes McKinney wrote: > > > > hi Micah, > > > > This sounds like a reasonable proposal,

Re: [Discuss][Format] Zero size record batches

2019-05-21 Thread Wes McKinney
I agree also. As a practical use case, the results of a request made with Arrow Flight might yield an empty result set. I'm not sure if this needs to be formally noted in the specification documents but it might not hurt. If someone can fix the Java implementation we could enable the integration t

Re: [DISCUSS] PR Backlog reduction

2019-05-21 Thread Wes McKinney
Any other thoughts about process to manage the backlog? On Thu, May 16, 2019 at 2:58 PM Wes McKinney wrote: > > hi Micah, > > This sounds like a reasonable proposal, and I agree in particular for > regular contributors that it makes sense to close PRs that are not > close to being in merge-readin

[jira] [Created] (ARROW-5390) [CI] Job time limit exceeded on Travis

2019-05-21 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5390: - Summary: [CI] Job time limit exceeded on Travis Key: ARROW-5390 URL: https://issues.apache.org/jira/browse/ARROW-5390 Project: Apache Arrow Issue Type: Bug

Re: Metadata for partitioned datasets in pyarrow.parquet

2019-05-21 Thread Richard Zamora
Thank you for the responses Wes and Joris! These summaries are very helpful to me. I decided to look into ARROW-5349 to get my feet wet, and just submitted a WIP PR (https://github.com/apache/arrow/pull/4361). If you get a chance, please take a look and provide feedback. I have limited exper

[jira] [Created] (ARROW-5389) [C++] Add an internal temporary directory API

2019-05-21 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5389: - Summary: [C++] Add an internal temporary directory API Key: ARROW-5389 URL: https://issues.apache.org/jira/browse/ARROW-5389 Project: Apache Arrow Issue Ty

[jira] [Created] (ARROW-5388) [Go] use arrow.TypeEqual in array.NewChunked

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5388: -- Summary: [Go] use arrow.TypeEqual in array.NewChunked Key: ARROW-5388 URL: https://issues.apache.org/jira/browse/ARROW-5388 Project: Apache Arrow Issue T

Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-05-21 Thread Wes McKinney
On Tue, May 21, 2019, 8:43 AM Antoine Pitrou wrote: > > Le 21/05/2019 à 13:42, Wes McKinney a écrit : > > hi Antoine, > > > > On Tue, May 21, 2019 at 5:48 AM Antoine Pitrou > wrote: > >> > >> > >> Hi Wes, > >> > >> How does copy-on-write play together with memory-mapped data? It seems > >> that

Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-05-21 Thread Antoine Pitrou
Le 21/05/2019 à 13:42, Wes McKinney a écrit : > hi Antoine, > > On Tue, May 21, 2019 at 5:48 AM Antoine Pitrou wrote: >> >> >> Hi Wes, >> >> How does copy-on-write play together with memory-mapped data? It seems >> that, depending on whether the memory map has several concurrent users >> (a co

[jira] [Created] (ARROW-5387) [Go] properly handle sub-slice of List

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5387: -- Summary: [Go] properly handle sub-slice of List Key: ARROW-5387 URL: https://issues.apache.org/jira/browse/ARROW-5387 Project: Apache Arrow Issue Type: B

Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-05-21 Thread Wes McKinney
hi Antoine, On Tue, May 21, 2019 at 5:48 AM Antoine Pitrou wrote: > > > Hi Wes, > > How does copy-on-write play together with memory-mapped data? It seems > that, depending on whether the memory map has several concurrent users > (a condition which may be timing-dependent), we will either persis

Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-05-21 Thread Wes McKinney
Comments are on now, sorry about that. On Tue, May 21, 2019, 1:06 AM Micah Kornfield wrote: > Hi Wes, > It looks like comments are turned off on the doc, this intentional? > > Thanks, > Micah > > On Mon, May 20, 2019 at 3:49 PM Wes McKinney wrote: > > > hi folks, > > > > I'm interested in start

Re: [pyarrow] Parquet page header size limit

2019-05-21 Thread shyam narayan singh
Hi I have submitted parent PR and the submodule PR . Regards Shyam On Tue, May 21, 2019 at 12:09 PM shyam narayan singh < shyambits2...@gmail.com> wrote: > Thanks Micah and Wes. Will try to submit a PR

Re: [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries

2019-05-21 Thread Antoine Pitrou
Hi Wes, How does copy-on-write play together with memory-mapped data? It seems that, depending on whether the memory map has several concurrent users (a condition which may be timing-dependent), we will either persist changes on disk or make them ephemeral in memory. That doesn't sound very us

[jira] [Created] (ARROW-5386) Making the rounding behavior of the buffer capacity configurable

2019-05-21 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5386: --- Summary: Making the rounding behavior of the buffer capacity configurable Key: ARROW-5386 URL: https://issues.apache.org/jira/browse/ARROW-5386 Project: Apache Arrow

[jira] [Created] (ARROW-5385) [Go] implement EXTENSION datatype

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5385: -- Summary: [Go] implement EXTENSION datatype Key: ARROW-5385 URL: https://issues.apache.org/jira/browse/ARROW-5385 Project: Apache Arrow Issue Type: New Fe

[jira] [Created] (ARROW-5384) [Go] add FixedSizeList array

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5384: -- Summary: [Go] add FixedSizeList array Key: ARROW-5384 URL: https://issues.apache.org/jira/browse/ARROW-5384 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-5383) [Go] update IPC flatbuf (new Duration type)

2019-05-21 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5383: -- Summary: [Go] update IPC flatbuf (new Duration type) Key: ARROW-5383 URL: https://issues.apache.org/jira/browse/ARROW-5383 Project: Apache Arrow Issue Ty

[jira] [Created] (ARROW-5382) SSE on ARM NEON

2019-05-21 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5382: -- Summary: SSE on ARM NEON Key: ARROW-5382 URL: https://issues.apache.org/jira/browse/ARROW-5382 Project: Apache Arrow Issue Type: Improvement Co

[jira] [Created] (ARROW-5381) Crash at arrow::internal::CountSetBits

2019-05-21 Thread Tham (JIRA)
Tham created ARROW-5381: --- Summary: Crash at arrow::internal::CountSetBits Key: ARROW-5381 URL: https://issues.apache.org/jira/browse/ARROW-5381 Project: Apache Arrow Issue Type: Bug Environmen