Re: [Rust] Adding owners to crates.io for arrow and parquet crates

2019-01-27 Thread Kouhei Sutou
Hi Andy, Thanks for sending an invite. I found it at https://crates.io/me/pending-invites and accepted. Thanks, -- kou In "Re: [Rust] Adding owners to crates.io for arrow and parquet crates" on Sat, 26 Jan 2019 06:33:41 -0700, Andy Grove wrote: > So I just discovered that the crates.io

Re: [Format] Passing selection masks with Arrow record batches

2019-01-27 Thread Paul Taylor
We’ve been doing this in a few different ways at Graphistry, mostly guided by use case and device characteristics. For temporary/in-memory/microservice CPU workloads, we’ll compute a set of valid row indices as one side of a DictionaryVector, with the original table/column as the dictionary sid

[jira] [Created] (ARROW-4400) [CI] install of clang tools failing

2019-01-27 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4400: - Summary: [CI] install of clang tools failing Key: ARROW-4400 URL: https://issues.apache.org/jira/browse/ARROW-4400 Project: Apache Arrow Issue Type

[RESULT] [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-27 Thread Wes McKinney
The vote carries with 5 binding +1 votes and 6 non-binding +1 Andy -- from the look of https://github.com/andygrove/datafusion/graphs/contributors you may have to track down some contributors to send ICLAs to the Apache secretary to be able to move forward. Some of the IP from these individuals

Re: [Format] Passing selection masks with Arrow record batches

2019-01-27 Thread Ravindra Pindikura
> On Jan 28, 2019, at 11:47 AM, Wes McKinney wrote: > > On Mon, Jan 28, 2019 at 12:05 AM Ravindra Pindikura > wrote: >> >> >> >>> On Jan 28, 2019, at 11:22 AM, Wes McKinney wrote: >>> >>> I was having a discussion recently about Arrow and the topic of >>> serve

Re: [Format] Passing selection masks with Arrow record batches

2019-01-27 Thread Wes McKinney
On Mon, Jan 28, 2019 at 12:05 AM Ravindra Pindikura wrote: > > > > > On Jan 28, 2019, at 11:22 AM, Wes McKinney wrote: > > > > I was having a discussion recently about Arrow and the topic of > > server-side filtering vs. client-side filtering came up. > > > > The basic problem is this: > > > > If

Re: [Format] Passing selection masks with Arrow record batches

2019-01-27 Thread Ravindra Pindikura
> On Jan 28, 2019, at 11:22 AM, Wes McKinney wrote: > > I was having a discussion recently about Arrow and the topic of > server-side filtering vs. client-side filtering came up. > > The basic problem is this: > > If you have a RecordBatch that you wish to filter out some of the > "rows", on

Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-27 Thread Andy Grove
Thanks for all the votes on this donation! Wes - it looks like votes have stopped now. Are the current number of votes sufficient? Thanks, Andy. On Thu, Jan 24, 2019 at 8:43 PM Kouhei Sutou wrote: > +1 (binding) > > In > "[VOTE] Accept donation of Rust DataFusion library for Apache Arrow"

[Format] Passing selection masks with Arrow record batches

2019-01-27 Thread Wes McKinney
I was having a discussion recently about Arrow and the topic of server-side filtering vs. client-side filtering came up. The basic problem is this: If you have a RecordBatch that you wish to filter out some of the "rows", one way to track this in-memory is to create a separate array of true/false

[jira] [Created] (ARROW-4399) [C++] Remove usage of "extern template class" from NumericArray

2019-01-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4399: --- Summary: [C++] Remove usage of "extern template class" from NumericArray Key: ARROW-4399 URL: https://issues.apache.org/jira/browse/ARROW-4399 Project: Apache Arrow

[jira] [Created] (ARROW-4398) [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and write)

2019-01-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4398: --- Summary: [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and write) Key: ARROW-4398 URL: https://issues.apache.org/jira/browse/ARROW-4398 Proj

[jira] [Created] (ARROW-4397) [C++] dim_names in Tensor and SparseTensor

2019-01-27 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-4397: --- Summary: [C++] dim_names in Tensor and SparseTensor Key: ARROW-4397 URL: https://issues.apache.org/jira/browse/ARROW-4397 Project: Apache Arrow Issue Type: New

[jira] [Created] (ARROW-4396) Update Typedoc to support TypeScript 3.2

2019-01-27 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-4396: -- Summary: Update Typedoc to support TypeScript 3.2 Key: ARROW-4396 URL: https://issues.apache.org/jira/browse/ARROW-4396 Project: Apache Arrow Issue Type: Improve

[jira] [Created] (ARROW-4395) ts-node throws type error running `bin/arrow2csv.js`

2019-01-27 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-4395: -- Summary: ts-node throws type error running `bin/arrow2csv.js` Key: ARROW-4395 URL: https://issues.apache.org/jira/browse/ARROW-4395 Project: Apache Arrow Issue T

[jira] [Created] (ARROW-4394) [C++] Can't build with debug option and MinGW

2019-01-27 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4394: --- Summary: [C++] Can't build with debug option and MinGW Key: ARROW-4394 URL: https://issues.apache.org/jira/browse/ARROW-4394 Project: Apache Arrow Issue Type:

Re: [Rust] code style: restrict line width to 90 characters?

2019-01-27 Thread Chao Sun
Looks like we have a majority. I just filed a PR [1] for this. One thing though is that the "comment_width" flag is only available in nightly, so there's no easy way to check that unless we use the nightly rustfmt checker. Chao [1]: https://github.com/apache/arrow/pull/3501 On Fri, Jan 25, 2019

[jira] [Created] (ARROW-4393) [Rust] coding style: apply 90 characters per line limit

2019-01-27 Thread Chao Sun (JIRA)
Chao Sun created ARROW-4393: --- Summary: [Rust] coding style: apply 90 characters per line limit Key: ARROW-4393 URL: https://issues.apache.org/jira/browse/ARROW-4393 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-4392) [Rust] Implement high-level Parquet writer

2019-01-27 Thread Chao Sun (JIRA)
Chao Sun created ARROW-4392: --- Summary: [Rust] Implement high-level Parquet writer Key: ARROW-4392 URL: https://issues.apache.org/jira/browse/ARROW-4392 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-4391) [Python] Support pyarrow.hdfs on Windows

2019-01-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4391: --- Summary: [Python] Support pyarrow.hdfs on Windows Key: ARROW-4391 URL: https://issues.apache.org/jira/browse/ARROW-4391 Project: Apache Arrow Issue Type: Impro

[jira] [Created] (ARROW-4390) [R] Serialize "labeled" metadata in Feather files, IPC messages

2019-01-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4390: --- Summary: [R] Serialize "labeled" metadata in Feather files, IPC messages Key: ARROW-4390 URL: https://issues.apache.org/jira/browse/ARROW-4390 Project: Apache Arrow

[jira] [Created] (ARROW-4389) [R] Installing clang-tools in CI is failing on trusty

2019-01-27 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4389: -- Summary: [R] Installing clang-tools in CI is failing on trusty Key: ARROW-4389 URL: https://issues.apache.org/jira/browse/ARROW-4389 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4388) [Go] add DimNames() method to tensor Interface?

2019-01-27 Thread Randall O'Reilly (JIRA)
Randall O'Reilly created ARROW-4388: --- Summary: [Go] add DimNames() method to tensor Interface? Key: ARROW-4388 URL: https://issues.apache.org/jira/browse/ARROW-4388 Project: Apache Arrow Is

[jira] [Created] (ARROW-4387) [Go] tensor doesn't support access to Null bitmap data?

2019-01-27 Thread Randall O'Reilly (JIRA)
Randall O'Reilly created ARROW-4387: --- Summary: [Go] tensor doesn't support access to Null bitmap data? Key: ARROW-4387 URL: https://issues.apache.org/jira/browse/ARROW-4387 Project: Apache Arrow

Re: [Format] [Rust] ChunkedArray, Column and Table

2019-01-27 Thread Wes McKinney
Just to add my two cents: The Arrow specification and Flatbuffers files defines a _binary protocol_ for making data available at the contiguous record batch level either in-process or via some other address space (a memory mapped file, a socket payload / RPC message). Chunked arrays and tables ar

Re: [Testing] Create csv-testing submodule?

2019-01-27 Thread Wes McKinney
That was my intent when I created that repo, so SGTM On Sun, Jan 27, 2019 at 10:57 AM Andy Grove wrote: > > I see we have an arrow-testing repo already (although it seems to be mostly > empty). Would this be the correct place to create a PR to add test files? > > On Sun, Jan 27, 2019 at 9:53 AM W

[jira] [Created] (ARROW-4386) [Rust] Implement Date and Time Arrays

2019-01-27 Thread nevi_me (JIRA)
nevi_me created ARROW-4386: -- Summary: [Rust] Implement Date and Time Arrays Key: ARROW-4386 URL: https://issues.apache.org/jira/browse/ARROW-4386 Project: Apache Arrow Issue Type: New Feature

Re: [Testing] Create csv-testing submodule?

2019-01-27 Thread Andy Grove
I see we have an arrow-testing repo already (although it seems to be mostly empty). Would this be the correct place to create a PR to add test files? On Sun, Jan 27, 2019 at 9:53 AM Wes McKinney wrote: > I'm in favor of using a submodule for testing data files to avoid > bloating the git reposit

Re: [Testing] Create csv-testing submodule?

2019-01-27 Thread Wes McKinney
I'm in favor of using a submodule for testing data files to avoid bloating the git repository. So far this hasn't been too painful with the Parquet test data files On Sun, Jan 27, 2019 at 10:36 AM Andy Grove wrote: > > That's a fair point about not needing a submodule... I was thinking about > co

Re: [Testing] Create csv-testing submodule?

2019-01-27 Thread Andy Grove
That's a fair point about not needing a submodule... I was thinking about converting some of the shared parquet files to CSV to help with testing DataFusion. I guess I can just put them there for now and if other implementations are interested we can just move them to a shared directory. Thanks,

Re: [Testing] Create csv-testing submodule?

2019-01-27 Thread Antoine Pitrou
Well, CSV isn't a standard like Parquet is, meaning each implementation can choose their own middle grounds and interpretations. Also, the parquet-testing submodule exists because Parquet implementations are spread accross different repositories. If we want a common location for CSV files accro

[jira] [Created] (ARROW-4385) [Python] default_version of a release should not include SNAPSHOT

2019-01-27 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4385: -- Summary: [Python] default_version of a release should not include SNAPSHOT Key: ARROW-4385 URL: https://issues.apache.org/jira/browse/ARROW-4385 Project: Apache Arrow

[Testing] Create csv-testing submodule?

2019-01-27 Thread Andy Grove
I like the fact that we have a parquet-testing submodule that is shared across implementations. It there any interest in having an equivalent for CSV files? Andy.

Re: [Format] [Rust] ChunkedArray, Column and Table

2019-01-27 Thread Antoine Pitrou
Hi Neville, Le 27/01/2019 à 13:07, Neville Dipale a écrit : > Hi Antoine, > > I've given your response some thought. > > I'm thinking more looking at the computational aspect of Arrow. I agree > that for representing and sharing data, RecordBatches achieve the purpose. > > I came across Chunk

[jira] [Created] (ARROW-4384) [C++] Running "format" target on new Windows 10 install opens "how do you want to open this file" dialog

2019-01-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4384: --- Summary: [C++] Running "format" target on new Windows 10 install opens "how do you want to open this file" dialog Key: ARROW-4384 URL: https://issues.apache.org/jira/browse/ARROW-43

Re: [RESULT] [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-27 Thread Krisztián Szűcs
Hey Kou! On Sun, Jan 27, 2019 at 12:02 AM Kouhei Sutou wrote: > Hi Krisztián, > > Could you also add your GPG key to > https://dist.apache.org/repos/dist/release/arrow/KEYS ? > Done! We should mention it somewhere in the (post-)release guide. Thanks! > > > Thanks, > -- > kou > > In > "Re:

Re: [Format] [Rust] ChunkedArray, Column and Table

2019-01-27 Thread Neville Dipale
Hi Antoine, I've given your response some thought. I'm thinking more looking at the computational aspect of Arrow. I agree that for representing and sharing data, RecordBatches achieve the purpose. I came across ChunkedArray, Column and Table while I was trying to create a dataframe library in R