Re: [ACTION REQUIRED] Changes to Arrow JIRA-related e-mail notifications

2020-06-17 Thread Fan Liya
Hi Wes, Thank you for your effort. I sent an email to issues-subscr...@arrow.apache.org, but got no response. In addition, I am not receiving JIRA information now. Best, Liya Fan On Mon, Jun 15, 2020 at 3:50 AM Wes McKinney wrote: > hi folks, > > Per the mailing list discussion and INFRA-20419

Re: Flight benchmark question

2020-06-17 Thread Yibo Cai
On 6/17/20 8:33 PM, David Li wrote: -- Tessian Warning -- There is something unusual about this email, please take care as it could be malicious. Tessian has flagged this email because the sender could be trying to impersonate someone at your company. The sender, "David Li ", looks si

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Neal Richardson
Maybe a draft pull request? If you put "WIP" in the pull request title, CI won't run builds on it, so it's suitable for rough outlines and collecting feedback. Neal On Wed, Jun 17, 2020 at 2:57 PM Radu Teodorescu wrote: > Thank you Wes! > Yes, both proposals fit very nicely in your Data Frames

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
Thank you Wes! Yes, both proposals fit very nicely in your Data Frames vision, I see them as deep dives on some specifics: - the virtual array doc is more fluffy an probably if you agree with the general concept, the next logical move is to put out some interfaces indeed - the random access doc g

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Wes McKinney
hi Radu, I'll read the proposals in more detail when I can and make comments, but this has always been something of interest (see, e.g. [1]). The intent with the "C++ data frames" project that we've discussed (and I continue to labor towards, e.g. recent compute engine work is directly in service

Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
Hi folks, While I’ve been communicating with some members of this group in the past, this is my first official post so please excuse/correct/guide me as needed. Logistics first: I put most of the content of my proposals in google doc, but if more appropriate, we can keep the conversation going b

Re: Flight benchmark question

2020-06-17 Thread David Li
Hey Yibo, Thanks for investigating this! This is a great writeup. There was a PR recently to let clients set gRPC options like this, so it can be enabled on a case-by-case basis: https://github.com/apache/arrow/pull/7406 So we could add that to the benchmark or suggest it in documentation. I thi

Re: [C++] Kernels with scalar input

2020-06-17 Thread Wes McKinney
hi Uwe, For Contains, wouldn't you want to make the kernel binary so that the "match" argument can be data-varying (e.g. it can be an Array, too)? Aside from that, the way to pass static data to the kernel is through an options. So you would do struct MyArg : public FunctionOptions { std::stri

Pandas string type

2020-06-17 Thread Adam Lippai
Hi, I was reading https://wesmckinney.com/blog/high-perf-arrow-to-pandas/ where Wes writes > "string or binary data would come with additional overhead while pandas > continues to use Python objects in its memory representation" Pandas 1.0 introduced StringDType which I thought could help with

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-17-0

2020-06-17 Thread Crossbow
Arrow Build Report for Job nightly-2020-06-17-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-17-0 Failed Tasks: - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-17-0-travis-centos-7-aarch64 - centos-8-am

[C++] Kernels with scalar input

2020-06-17 Thread Uwe L. Korn
Hello all, I'm trying to implement a `contains` kernel that takes as an input a StringArray and a scalar string (see https://issues.apache.org/jira/browse/ARROW-9160). I feel confident with the rest of the new Kernels setup but I didn't find an example kernel where we also pass in a scalar att

Re: Flight benchmark question

2020-06-17 Thread Chengxin Ma
Hi Yibo, Your discovery is impressive. Did you consider the `num_streams` parameter [1] as well? If I understood correctly, this parameter is used for setting the conceptual concurrent streams between the client and the server, while `num_threads` is used for setting the size of the thread p