Re: Allow dictionary-encoded children?

2018-04-06 Thread Brian Hulette
Thanks Uwe, Wes, glad to hear I'm not too far out there :) The dictionary batch ordering seems like a reasonable requirement for this situation. I made a JIRA to add something like this to the integration tests (https://issues.apache.org/jira/browse/ARROW-2412) and Ill put up a PR shortly.

[jira] [Created] (ARROW-2412) [Integration] Add nested dictionary integration test

2018-04-06 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2412: Summary: [Integration] Add nested dictionary integration test Key: ARROW-2412 URL: https://issues.apache.org/jira/browse/ARROW-2412 Project: Apache Arrow Iss

Re: Allow dictionary-encoded children?

2018-04-06 Thread Wes McKinney
Having dictionaries-within-dictionaries does add some complexity, but I think the use case is valid and so it would be good to determine the best way to handle this in the IPC / messaging protocol. I would suggest: dictionaries can use other dictionaries, so long as those dictionaries occur earlie

[jira] [Created] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2411: -- Summary: [C++] Add method to append batches of null-terminated strings to StringBuilder Key: ARROW-2411 URL: https://issues.apache.org/jira/browse/ARROW-2411 Project: Apa

Re: Allow dictionary-encoded children?

2018-04-06 Thread Uwe L. Korn
Hello Brian, I would also have considered this a legitimate use of the Arrow specification. We only specify the DictionaryType to have a dictionary of any Arrow Type. In the context of Arrow's IPC this seems to be a bit more complicated as we seem to have the assumption that there is only one t

[jira] [Created] (ARROW-2410) [JS] Add DataFrame.scanAsync

2018-04-06 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2410: Summary: [JS] Add DataFrame.scanAsync Key: ARROW-2410 URL: https://issues.apache.org/jira/browse/ARROW-2410 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-2409) [Rust] Test for build warnings, remove current warnings

2018-04-06 Thread Maximilian Roos (JIRA)
Maximilian Roos created ARROW-2409: -- Summary: [Rust] Test for build warnings, remove current warnings Key: ARROW-2409 URL: https://issues.apache.org/jira/browse/ARROW-2409 Project: Apache Arrow

Allow dictionary-encoded children?

2018-04-06 Thread Brian Hulette
I've been considering a use-case with a dictionary-encoded struct column, which may contain some dictionary-encoded columns itself. More specifically, in this use-case each row represents a single observation in a geospatial track, which includes a position, a time, and some track-level metadat

[jira] [Created] (ARROW-2408) [Rust] It should be possible to get a &mut[T] from Builder

2018-04-06 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2408: - Summary: [Rust] It should be possible to get a &mut[T] from Builder Key: ARROW-2408 URL: https://issues.apache.org/jira/browse/ARROW-2408 Project: Apache Arrow Is

[jira] [Created] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-06 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2407: --- Summary: [GLib] Add garrow_string_array_builder_append_values() Key: ARROW-2407 URL: https://issues.apache.org/jira/browse/ARROW-2407 Project: Apache Arrow Iss

[jira] [Created] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-06 Thread Dave Challis (JIRA)
Dave Challis created ARROW-2406: --- Summary: [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided Key: ARROW-2406 URL: https://issues.apache.org/jira/browse/ARROW-2406

[jira] [Created] (ARROW-2405) [C++] is missing in plasma/client.h

2018-04-06 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2405: --- Summary: [C++] is missing in plasma/client.h Key: ARROW-2405 URL: https://issues.apache.org/jira/browse/ARROW-2405 Project: Apache Arrow Issue Type: Bug