Hello, As discussed on [1], I've proposed clarifications in a PR [2] that clarifies:
1. It is not required that all dictionary batches occur at the beginning of the IPC stream format (if a the first record batch has an all null dictionary encoded column, the null column's dictionary might not be sent until later in the stream). 2. A second dictionary batch for the same ID that is not a "delta batch" in an IPC stream indicates the dictionary should be replaced. 3. Clarifies that the file format, can only contain 1 "NON-delta" dictionary batch and multiple "delta" dictionary batches. 4. Add an enum to dictionary metadata for possible future changes in what format dictionary batches can be sent. (the most likely would be an array Map<Int, Value>). An enum is needed as a place holder to allow for forward compatibility past the release 1.0.0. If accepted there will be work in all implementations to make sure that they cover the edge cases highlighted and additional integration testing will be needed. Please vote whether to accept these additions. The vote will be open for at least 72 hours. [ ] +1 Accept these change to the specification [ ] +0 [ ] -1 Do not accept the changes because... Thanks, Micah [1] https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E [2] https://github.com/apache/arrow/pull/5585