Re: [Discuss] Support read/write interleaved dictionaries and batches in IPC stream

2019-08-27 Thread Wes McKinney
hi all, I agree with handling the interleaved dictionary case. While we are at it, I think we should formally allow dictionary replacements. Note that I just opened https://github.com/apache/arrow/pull/5202 to revamp / consolidate the format documents. So any changes will need to be based on th

Re: [Discuss] Support read/write interleaved dictionaries and batches in IPC stream

2019-08-21 Thread Micah Kornfield
Hi Ji Liu, Thanks for getting the conversation started. I think a few things need to happen: 1. We need to clarify in the specification that not all dictionaries need to be present at the beginning. I plan on creating a PR for discussion that clarifies this point, as well as handling of non-delt

[Discuss] Support read/write interleaved dictionaries and batches in IPC stream

2019-08-21 Thread Ji Liu
Hi all, Recently when we worked on fixing a IPC related bug in both Java/C++ sides[1][2], @emkornfieldfound that the stream reader assumes that all dictionaries are at the start of the stream which is inconsistent with spec[3] which says as long as a record batch doesn't reference a dictionar