[jira] [Created] (ARROW-5834) [Java] Apply new hash map in DictionaryEncoder

2019-07-02 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5834: - Summary: [Java] Apply new hash map in DictionaryEncoder Key: ARROW-5834 URL: https://issues.apache.org/jira/browse/ARROW-5834 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-5833) [C++] Factor out status copying code from cast.cc

2019-07-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5833: -- Summary: [C++] Factor out status copying code from cast.cc Key: ARROW-5833 URL: https://issues.apache.org/jira/browse/ARROW-5833 Project: Apache Arrow Is

[jira] [Created] (ARROW-5832) Support search operations for vector data

2019-07-02 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5832: --- Summary: Support search operations for vector data Key: ARROW-5832 URL: https://issues.apache.org/jira/browse/ARROW-5832 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-5831) [Release] Migrate and improve binary release verification script

2019-07-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5831: -- Summary: [Release] Migrate and improve binary release verification script Key: ARROW-5831 URL: https://issues.apache.org/jira/browse/ARROW-5831 Project: Apache Ar

[jira] [Created] (ARROW-5830) [C++] Stop using memcmp in TensorEquals

2019-07-02 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-5830: --- Summary: [C++] Stop using memcmp in TensorEquals Key: ARROW-5830 URL: https://issues.apache.org/jira/browse/ARROW-5830 Project: Apache Arrow Issue Type: Improv

Re: [DISCUSS] 32- and 64-bit decimal types

2019-07-02 Thread Wes McKinney
That's certainly an option, too. On Tue, Jul 2, 2019 at 9:40 PM Micah Kornfield wrote: > > Hi Wes, > Just a question, I'm ok going either way on this but why not a new variable > width decimal type and deprecating the old one instead of breaking forward > compatibility? > > Thanks, > Micah > > On

Re: [DISCUSS] 32- and 64-bit decimal types

2019-07-02 Thread Micah Kornfield
Hi Wes, Just a question, I'm ok going either way on this but why not a new variable width decimal type and deprecating the old one instead of breaking forward compatibility? Thanks, Micah On Tuesday, July 2, 2019, Wes McKinney wrote: > Note that if we do make this change as described, it will p

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Ravindra Pindikura
ok, thanks ! On Wed, Jul 3, 2019 at 7:10 AM Sutou Kouhei wrote: > Hi, > > Thanks for verifying this RC! > > > 2. The package doesn't seem to include gandiva > > > > is that intentional ? I'm fine if it is not included, just want to > confirm > > if that's expected. > > I think that this is cause

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Ravindra Pindikura
On Wed, Jul 3, 2019 at 7:06 AM Wes McKinney wrote: > For the record, because Flight is so new and isn't being tested by very > many contributors in their environments, I would expect a lot of problems > and don't think they pose an issue for releasing. Let's open follow up > JIRAs > done, ARROW-

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Ravindra Pindikura
On Wed, Jul 3, 2019 at 7:04 AM Wes McKinney wrote: > @Ravindra, could you clarify what point #2 means? > As part of the release verify script, the mvn tests are run for all modules (memory, vector, flight, jdbc, ..). I saw that the gandiva tests aren't running as part of that. @kou confirmed th

[jira] [Created] (ARROW-5829) [Java] failure in TestServerOptions.domainSocket

2019-07-02 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-5829: - Summary: [Java] failure in TestServerOptions.domainSocket Key: ARROW-5829 URL: https://issues.apache.org/jira/browse/ARROW-5829 Project: Apache Arrow

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Sutou Kouhei
Hi, Thanks for verifying this RC! > 2. The package doesn't seem to include gandiva > > is that intentional ? I'm fine if it is not included, just want to confirm > if that's expected. I think that this is caused by "-P arrow-jni" is missing in 01-perform.sh: https://github.com/apache/arrow/p

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Wes McKinney
For the record, because Flight is so new and isn't being tested by very many contributors in their environments, I would expect a lot of problems and don't think they pose an issue for releasing. Let's open follow up JIRAs On Tue, Jul 2, 2019, 8:34 PM Wes McKinney wrote: > @Ravindra, could you c

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Wes McKinney
@Ravindra, could you clarify what point #2 means? On Tue, Jul 2, 2019, 8:29 PM Sutou Kouhei wrote: > Hi, > > Thanks for your report. > > Adding -DProtobuf_SOURCE=BUNDLED CMake option is workaround. > I don't think that this is a critical problem for this RC. > > We will be able to avoid this pro

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Sutou Kouhei
Hi, Thanks for your report. Adding -DProtobuf_SOURCE=BUNDLED CMake option is workaround. I don't think that this is a critical problem for this RC. We will be able to avoid this problem automatically by the following patch in 1.0.0: https://github.com/apache/arrow/pull/4785 Thanks, -- kou I

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Ravindra Pindikura
I tried "./dev/release/verify-release-candidate.sh source 0.14.0 0" on mac mojave. 1. I consistently get this error with flight tests [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.04 s <<< FAILURE! - in org.apache.arrow.flight.TestServerOptions [ERROR] domainSocket(org

[jira] [Created] (ARROW-5828) [C++] Add Protocol Buffers version check

2019-07-02 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5828: --- Summary: [C++] Add Protocol Buffers version check Key: ARROW-5828 URL: https://issues.apache.org/jira/browse/ARROW-5828 Project: Apache Arrow Issue Type: Impro

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Sutou Kouhei
> I tried again (Ubuntu 18.04): > * source verification failed in gRPC configure step: > Problem is, Ubuntu's c-ares does not provide any CMake files: Note: Adding -Dc-ares_SOURCE=BUNDLED CMake option is workaround. We can use bundled c-ares automatically by requiring c-ares's CMake config: htt

[jira] [Created] (ARROW-5827) [C++] Require c-ares CMake config

2019-07-02 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5827: --- Summary: [C++] Require c-ares CMake config Key: ARROW-5827 URL: https://issues.apache.org/jira/browse/ARROW-5827 Project: Apache Arrow Issue Type: Improvement

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Kenta Murata
I tried on Ubuntu Bionic, and got the build errors on grpc_ep (version 1.20.0). The error log will be shown the last of this mail. The errors are (1) absence of php_generator.h header file, and (2) the absence of has_ruby_package function in google::protobuf::FileOptions class. PHP support was int

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Sutou Kouhei
Hi, Could Java developers take a look this? Anyway, could Java developers, especially PMC members, also verify this RC? Thanks, -- kou In <89d43d96-1fb2-4d08-8b52-c484194b9...@gmail.com> "Re: [VOTE] Release Apache Arrow 0.14.0 - RC0" on Wed, 3 Jul 2019 02:59:54 +0900, Yosuke Shiro wrote:

[DISCUSS] C++ SO versioning with 1.0.0

2019-07-02 Thread Sutou Kouhei
Hi, We'll release 0.14.0 soon. Then we use "1.0.0-SNAPSHOT" at master. If we use "1.0.0-SNAPSHOT", C++ build is failed: https://github.com/apache/arrow/blob/master/cpp/CMakeLists.txt#L47 message(FATAL_ERROR "Need to implement SO version generation for Arrow 1.0+") So we need to consider how t

[jira] [Created] (ARROW-5826) [Website] Blog post for 0.14.0 release announcement

2019-07-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5826: --- Summary: [Website] Blog post for 0.14.0 release announcement Key: ARROW-5826 URL: https://issues.apache.org/jira/browse/ARROW-5826 Project: Apache Arrow Issue

Re: [DISCUSS] 32- and 64-bit decimal types

2019-07-02 Thread Wes McKinney
Note that if we do make this change as described, it will probably need to accompany a bump in the MetadataVersion (for forward-compatibility reasons, otherwise old clients won't be able to distinguish one decimal type from another). But that seems prudent regardless to force an upgrade to the stab

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread John Muehlhausen
Crikey! I'll do some testing around that and suggest some test cases to ensure it continues to work, assuming that it does. -John On Tue, Jul 2, 2019 at 2:41 PM Wes McKinney wrote: > Thanks for the attachment, it's helpful. > > On Tue, Jul 2, 2019 at 1:40 PM John Muehlhausen wrote: > > > > Att

[jira] [Created] (ARROW-5825) [Python] Exceptions swallowed in ParquetManifest._visit_directories

2019-07-02 Thread George Sakkis (JIRA)
George Sakkis created ARROW-5825: Summary: [Python] Exceptions swallowed in ParquetManifest._visit_directories Key: ARROW-5825 URL: https://issues.apache.org/jira/browse/ARROW-5825 Project: Apache Arr

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread Wes McKinney
Thanks for the attachment, it's helpful. On Tue, Jul 2, 2019 at 1:40 PM John Muehlhausen wrote: > > Attachments referred to in previous two messages: > https://www.dropbox.com/sh/6ycfuivrx70q2jx/AAAt-RDaZWmQ2VqlM-0s6TqWa?dl=0 > > On Tue, Jul 2, 2019 at 1:14 PM John Muehlhausen wrote: > > > Thank

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread John Muehlhausen
Attachments referred to in previous two messages: https://www.dropbox.com/sh/6ycfuivrx70q2jx/AAAt-RDaZWmQ2VqlM-0s6TqWa?dl=0 On Tue, Jul 2, 2019 at 1:14 PM John Muehlhausen wrote: > Thanks, Wes, for the thoughtful reply. I really appreciate the > engagement. In order to clarify things a bit, I

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread John Muehlhausen
Thanks, Wes, for the thoughtful reply. I really appreciate the engagement. In order to clarify things a bit, I am attaching a graphic of how our application will take record-wise (row-oriented) data from an event source and incrementally populate a pre-allocated Arrow-compatible buffer, including

Re: [DISCUSS] Ongoing Travis CI service degradation

2019-07-02 Thread Antoine Pitrou
Le 02/07/2019 à 19:52, Micah Kornfield a écrit : > Would GCP Cloud Build work [1]. The number one question is: does it offer *copious* capacity for open source projects, for free? If it does not, it's not useful to bother investigating it IMHO (there are dozens or even hundreds of online CI prov

[jira] [Created] (ARROW-5824) [Gandiva] [C++] Fix decimal null

2019-07-02 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-5824: -- Summary: [Gandiva] [C++] Fix decimal null Key: ARROW-5824 URL: https://issues.apache.org/jira/browse/ARROW-5824 Project: Apache Arrow Iss

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Yosuke Shiro
Ran dev/release/verify-release-candidate.sh source 0.14.0 0 on macOS Mojave. I got the following error, but it may be specific to my environment. """ [ERROR] Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.2.2:revision (for-jars) on project arrow-java-root: Could not complete Mo

Re: [DISCUSS] Ongoing Travis CI service degradation

2019-07-02 Thread Micah Kornfield
Would GCP Cloud Build work [1]. When trying to install it looks like the permissions required are: * Read access to code * Read access to issues, metadata, and pull requests * Read and write access to checks and commit statuses It looks like the free tier is quite limited, but I can try to invest

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread Wes McKinney
hi John, On Tue, Jul 2, 2019 at 11:23 AM John Muehlhausen wrote: > > During my time building financial analytics and trading systems (23 years!), > both the "batch processing" and "stream processing" paradigms have been > extensively used by myself and by colleagues. > > Unfortunately, the tool

Re: [DISCUSS] Ongoing Travis CI service degradation

2019-07-02 Thread Antoine Pitrou
Le 02/07/2019 à 18:22, Eric Erhardt a écrit : > Has anyone considered using Azure DevOps for CI and patch validation? Tried indeed and failed: https://issues.apache.org/jira/browse/INFRA-17030 Regards Antoine.

[Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-02 Thread John Muehlhausen
During my time building financial analytics and trading systems (23 years!), both the "batch processing" and "stream processing" paradigms have been extensively used by myself and by colleagues. Unfortunately, the tools used in these paradigms have not successfully overlapped. For example, an ana

RE: [DISCUSS] Ongoing Travis CI service degradation

2019-07-02 Thread Eric Erhardt
Has anyone considered using Azure DevOps for CI and patch validation? https://azure.microsoft.com/en-us/services/devops/pipelines/ > Get cloud-hosted pipelines for Linux, macOS, and Windows with unlimited > minutes and 10 free parallel jobs for open source I guess I am not familiar with ASF pol

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-02 Thread Wes McKinney
Correct. The encapsulated IPC message will just be 4 bytes bigger. On Tue, Jul 2, 2019, 9:31 AM Antoine Pitrou wrote: > > I guess I still dont understand how the IPC stream format works :-/ > > To put it clearly: what happens in Flight? Will a Flight message > automatically get the "stream cont

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-02 Thread Antoine Pitrou
I guess I still dont understand how the IPC stream format works :-/ To put it clearly: what happens in Flight? Will a Flight message automatically get the "stream continuation message" in front of it? Le 02/07/2019 à 16:15, Wes McKinney a écrit : > On Tue, Jul 2, 2019 at 4:23 AM Antoine Pitro

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-02 Thread Wes McKinney
On Tue, Jul 2, 2019 at 4:23 AM Antoine Pitrou wrote: > > > Le 02/07/2019 à 00:20, Wes McKinney a écrit : > > Thanks for the references. > > > > If we decided to make a change around this, we could call the first 4 > > bytes a stream continuation marker to make it slightly less ugly > > > > * 0xFFF

[jira] [Created] (ARROW-5823) [Rust] Fix build break.

2019-07-02 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5823: - Summary: [Rust] Fix build break. Key: ARROW-5823 URL: https://issues.apache.org/jira/browse/ARROW-5823 Project: Apache Arrow Issue Type: Bug Reporter:

[jira] [Created] (ARROW-5822) Provide a sample json file for the flight example

2019-07-02 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5822: --- Summary: Provide a sample json file for the flight example Key: ARROW-5822 URL: https://issues.apache.org/jira/browse/ARROW-5822 Project: Apache Arrow Issue Type: Impr

Re: Spark and Arrow Flight

2019-07-02 Thread Antoine Pitrou
Either #3 or #4 for me. If #3, the default GetSchema implementation can rely on calling GetFlightInfo. Le 01/07/2019 à 22:50, David Li a écrit : > I think I'd prefer #3 over overloading an existing call (#2). > > We've been thinking about a similar issue, where sometimes we want > just the sc

[FYI] Windows CRT woes

2019-07-02 Thread Antoine Pitrou
Hello, This is most likely not a Arrow problem but I'm posting this here because it's a weird issue nonetheless: https://issues.apache.org/jira/browse/ARROW-5410 Regards Antoine.

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Antoine Pitrou
+1 for this RC0 anyway (binding). Le 02/07/2019 à 11:36, Antoine Pitrou a écrit : > > I tried again (Ubuntu 18.04): > > * binaries verification succeeded > > * source verification failed in gRPC configure step: > > CMake Error at cmake/cares.cmake:38 (find_package): > Could not find a pac

Re: [VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-02 Thread Antoine Pitrou
I tried again (Ubuntu 18.04): * binaries verification succeeded * source verification failed in gRPC configure step: CMake Error at cmake/cares.cmake:38 (find_package): Could not find a package configuration file provided by "c-ares" with any of the following names: c-aresConfig.cmake

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-02 Thread Antoine Pitrou
Le 02/07/2019 à 00:20, Wes McKinney a écrit : > Thanks for the references. > > If we decided to make a change around this, we could call the first 4 > bytes a stream continuation marker to make it slightly less ugly > > * 0x: continue > * 0x: stop Do you mean it would be a sepa

[jira] [Created] (ARROW-5821) [Java] Support compact fixed-width vectors

2019-07-02 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5821: - Summary: [Java] Support compact fixed-width vectors Key: ARROW-5821 URL: https://issues.apache.org/jira/browse/ARROW-5821 Project: Apache Arrow Issue Type: New Feature