Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Wes McKinney
+1 (binding) Thanks Kou for adding the missing signatures. * I was able to verify the binaries after the signature fix. The Linux package tests are very nice! * I ran the following source verifications (on linux except where noted) * C++ (Ubuntu 19.04 and Windows, with patch https://github.com/

Re: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-01 Thread Micah Kornfield
Hi Wes, Thanks for your response. In regards to the protocol negotiation your description of feature reporting (snipped below) is along the lines of what I was thinking. It might not be necessary for 1.0.0, but at some point might become useful. > Note that we don't really have a mechanism for

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-01 Thread Wes McKinney
Thanks for the references. If we decided to make a change around this, we could call the first 4 bytes a stream continuation marker to make it slightly less ugly * 0x: continue * 0x: stop On Mon, Jul 1, 2019 at 4:35 PM Micah Kornfield wrote: > > Hi Wes, > I'm not an expert on th

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-01 Thread Micah Kornfield
Hi Wes, I'm not an expert on this either, my inclination mostly comes from some research I've done. I think it is important to distinguish two cases: 1. unaligned access at the processor instruction level 2. undefined behavior >From my reading unaligned access is fine on most modern architectur

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Sutou Kouhei
Hi, > but it failed with > > https://gist.github.com/wesm/711ae3d66c942db293dba55ff237871a Thanks for catching this. I failed to upload some files. I uploaded missing files. I confirmed that there are no missing files with the following Ruby script: -- #!/usr/bin/env ruby require "open-uri" r

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Sutou Kouhei
Hi, Thanks for verifying this RC. > - failed to verify binaries > > """ > + echo 'Failed to verify release candidate. See /tmp/arrow-0.14.0.gucvU > for details.' > Failed to verify release candidate. See /tmp/arrow-0.14.0.gucvU for details. > """ > > There's no details in /tmp/arrow-0.14.0.gucv

Re: Spark and Arrow Flight

2019-07-01 Thread Wes McKinney
On Mon, Jul 1, 2019 at 3:50 PM David Li wrote: > > I think I'd prefer #3 over overloading an existing call (#2). > > We've been thinking about a similar issue, where sometimes we want > just the schema, but the service can't necessarily return the schema > without fetching data - right now we retu

Tracking running threads to close prior to Arrow 1.0.0 release

2019-07-01 Thread Wes McKinney
I started a Google Document to try to assemble outstanding discussion threads with links to the mailing list so we do not lose track of the various items that are up in the air. The document is not complete -- if you would like Edit access to the document please request and I will add you. Feel fr

Re: Spark and Arrow Flight

2019-07-01 Thread David Li
I think I'd prefer #3 over overloading an existing call (#2). We've been thinking about a similar issue, where sometimes we want just the schema, but the service can't necessarily return the schema without fetching data - right now we return a sentinel value in GetFlightInfo, but a separate RPC wo

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-01 Thread Wes McKinney
The <0x> solution is downright ugly but I think it's one of the only ways that achieves * backward compatibility (new clients can read old data) * opt-in forward compatibility (if we want to go to the labor of doing so, sort of dangerous) * old clients receiving new data do not blow up (th

Re: Spark and Arrow Flight

2019-07-01 Thread Wes McKinney
My inclination is either #2 or #3. #4 is an option of course, but I like the more structured solution of explicitly requesting the schema given a descriptor. In both cases, it's possible that schemas are sent twice, e.g. if you call GetSchema and then later call GetFlightInfo and so you receive th

[jira] [Created] (ARROW-5820) [Release] Remove undefined variable check from verify script

2019-07-01 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5820: --- Summary: [Release] Remove undefined variable check from verify script Key: ARROW-5820 URL: https://issues.apache.org/jira/browse/ARROW-5820 Project: Apache Arrow

Re: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-01 Thread Wes McKinney
hi Micah, Sorry for the delay in feedback. I looked at the document and it seems like a reasonable perspective about forward- and backward-compatibility. It seems like the main thing you are proposing is to apply Semantic Versioning to Format and Library versions separately. That's an interesting

Re: RecordBatch with Tensors/Arrays

2019-07-01 Thread Wes McKinney
hi Andrew, I'm copying dev@ just so more folks are in the loop On Wed, Jun 19, 2019 at 9:13 AM Andrew Spott wrote: > > I was told to post this here, rather than as an issue on Github. > > > > I'm looking to serialize data that looks something like this: > > ``` > record = { "predicted": , >

[jira] [Created] (ARROW-5819) [Python] Store sequences of arbitrary ndarrays (with same type) in Tensor value type

2019-07-01 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5819: --- Summary: [Python] Store sequences of arbitrary ndarrays (with same type) in Tensor value type Key: ARROW-5819 URL: https://issues.apache.org/jira/browse/ARROW-5819 Proj

[jira] [Created] (ARROW-5818) [Java][Gandiva] support varlen output vectors

2019-07-01 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-5818: - Summary: [Java][Gandiva] support varlen output vectors Key: ARROW-5818 URL: https://issues.apache.org/jira/browse/ARROW-5818 Project: Apache Arrow

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Wes McKinney
The C++/Python source build looks fine to me on the Windows side -- I added Flight support in https://github.com/apache/arrow/pull/4770 I opened https://issues.apache.org/jira/browse/ARROW-5817 as there is a risk that Flight Python tests might be silently skipped. We check in our Python package b

[jira] [Created] (ARROW-5817) [Python] Use pytest marks for Flight test to avoid silently skipping unit tests due to import failures

2019-07-01 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5817: --- Summary: [Python] Use pytest marks for Flight test to avoid silently skipping unit tests due to import failures Key: ARROW-5817 URL: https://issues.apache.org/jira/browse/ARROW-5817

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Wes McKinney
hi Antoine, I'm not sure the origin of the conda.sh failure, have you tried removing any bashrc stuff related to the Anaconda distribution that you develop against? With the following patch I'm able to run the binary verification https://github.com/apache/arrow/pull/4768 but it failed with http

[jira] [Created] (ARROW-5816) [Release] Parallel curl does not work reliably in verify-release-candidate-sh

2019-07-01 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5816: --- Summary: [Release] Parallel curl does not work reliably in verify-release-candidate-sh Key: ARROW-5816 URL: https://issues.apache.org/jira/browse/ARROW-5816 Project: Ap

[jira] [Created] (ARROW-5815) [Java] Support swap functionality for fixed-width vectors

2019-07-01 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5815: --- Summary: [Java] Support swap functionality for fixed-width vectors Key: ARROW-5815 URL: https://issues.apache.org/jira/browse/ARROW-5815 Project: Apache Arrow Issue Ty

[jira] [Created] (ARROW-5814) [Java] Implement a HashMap for DictionaryEncoder

2019-07-01 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5814: - Summary: [Java] Implement a HashMap for DictionaryEncoder Key: ARROW-5814 URL: https://issues.apache.org/jira/browse/ARROW-5814 Project: Apache Arrow Issue Type: Improve

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-01 Thread Antoine Pitrou
On Ubuntu 18.04: - failed to verify binaries """ + echo 'Failed to verify release candidate. See /tmp/arrow-0.14.0.gucvU for details.' Failed to verify release candidate. See /tmp/arrow-0.14.0.gucvU for details. """ There's no details in /tmp/arrow-0.14.0.gucvU. The script left a lot of zombie