Hi, As discussed on the mailing list [1], it has been proposed to allow the use of unsigned dictionary indices (which is already technically possible in our metadata serialization, but not allowed according to the language of the columnar specification), with the following caveats:
* Unless part of an application's requirements (e.g. if it is necessary to store dictionaries with size 128 to 255 more compactly), implementations are recommended to prefer signed over unsigned integers, with int32 continuing to be the "default" when the indexType field of DictionaryEncoding is null * uint64 dictionary indices, while permitted, are strongly not recommended unless required by an application as they are more difficult to work with in some programming languages (e.g. Java) and they do not offer the storage size benefits that uint8 and uint16 do. This change is backwards compatible, but not forward compatible for all implementations (for example, C++ will reject unsigned integers). Assuming that the V5 MetadataVersion change is accepted, to protect against forward compatibility issues such implementations would be recommended to not allow unsigned dictionary indices to be serialized using V4 MetadataVersion. A PR with the changes to the columnar specification (possibly subject to some clarifying language) is at [2]. The vote will be open for at least 72 hours. [ ] +1 Accept changes to allow unsigned integer dictionary indices [ ] +0 [ ] -1 Do not accept because... [1]: https://lists.apache.org/thread.html/r746e0a76c4737a2cf48dec656103677169bebb303240e62ae1c66d35%40%3Cdev.arrow.apache.org%3E [2]: https://github.com/apache/arrow/pull/7567