I've opened a pull request [1] to clarify some recent conversations about
semantics/edge cases for dictionary encoding [2][3] around interleaved
batches and when isDelta=False.

Specifically, it proposes isDelta=False indicates dictionary replacement.
For the file format, only one isDelta=False batch is allowed per file and
isDelta=true batches are applied in the order supplied file footer.

In addition, I've added a new enum to DictionaryEncoding to preserve future
compatibility in case we want to expand dictionary encoding to be an
explicit mapping from "ID" to "VALUE" as discussed in [4].

Once people have had a change to review and come to a consensus. I will
call a formal vote to approve the change commit the change.

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/5585
[2]
https://lists.apache.org/thread.html/9734b71bc12aca16eb997388e95105bff412fdaefa4e19422f477389@%3Cdev.arrow.apache.org%3E
[3]
https://lists.apache.org/thread.html/5c3c9346101df8d758e24664638e8ada0211d310ab756a89cde3786a@%3Cdev.arrow.apache.org%3E
[4]
https://lists.apache.org/thread.html/15a4810589b2eb772bce5b2372970d9d93badbd28999a1bbe2af418a@%3Cdev.arrow.apache.org%3E

Reply via email to