I'd like to understand if any one is making use of the following features and if we should revisit them before 1.0.
1. Dictionaries can encode null values. - This become error prone for things like parquet. We seem to be calculating the definition level solely based on the null bitmap. I might have missed something but it appears that we only check if a dictionary contains nulls on the optimized path [1] but not when converting the dictionary array back to dense, so I think the values written could get out of sync with the rep/def levels? It seems we should potentially disallow dictionaries to contain null values? 2. Dictionaries can nested columns which are in turn dictionary encoded columns. - Again we aren't handling this in Parquet today, and I'm wondering if it worth the effort. There was a PR merged a while ago [2] to add a "skipped" integration test but it doesn't look like anyone has done follow-up work to make enable this/make it pass. It seems simpler to keep dictionary encoding at the leafs of the schema. Of the two I'm a little more worried that Option #1 will break people if we decide to disallow it. Thoughts? Thanks, Micah [1] https://github.com/apache/arrow/blob/bd38beec033a2fdff192273df9b08f120e635b0c/cpp/src/parquet/encoding.cc#L765 [2] https://github.com/apache/arrow/pull/1848