I've opened a pull request [1] to clarify some recent conversations about semantics/edge cases for dictionary encoding [2][3] around interleaved batches and when isDelta=False.
Specifically, it proposes isDelta=False indicates dictionary replacement. For the file format, only one isDelta=False batch is allowed per file and isDelta=true batches are applied in the order supplied file footer. In addition, I've added a new enum to DictionaryEncoding to preserve future compatibility in case we want to expand dictionary encoding to be an explicit mapping from "ID" to "VALUE" as discussed in [4]. Once people have had a change to review and come to a consensus. I will call a formal vote to approve the change commit the change. Thanks, Micah [1] https://github.com/apache/arrow/pull/5585 [2] https://lists.apache.org/thread.html/9734b71bc12aca16eb997388e95105bff412fdaefa4e19422f477389@%3Cdev.arrow.apache.org%3E [3] https://lists.apache.org/thread.html/5c3c9346101df8d758e24664638e8ada0211d310ab756a89cde3786a@%3Cdev.arrow.apache.org%3E [4] https://lists.apache.org/thread.html/15a4810589b2eb772bce5b2372970d9d93badbd28999a1bbe2af418a@%3Cdev.arrow.apache.org%3E