I can provide some context here. The field is very old and when we realized that it was not only unused but also difficult to produce and use in practice (can't be combined) we deprecated the field. However, some folks from Dremio wanted to bring it back because they said they could store values there and had a way to use them.
+1, but it would be good to check in with some Dremio engineers and see if they are using it. I assume they aren't since this thread hasn't gotten much attention. Thanks for bringing this up! On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble <jacobmar...@firetiger.com.invalid> wrote: > Xuanwo, do you favor deprecating or removing `distinct_count`? > > Due to lack of any real implementation, I myself favor removal (PR 12183). > > Jacob Marble > 🔥🐅 > > > On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote: > >> Here is my +1 binding. >> >> The current status of `distinct_count` is quite confusing, which has also >> led to additional discussions in `iceberg-rust` about whether we need to >> add it and how to maintain it. >> >> Removing it seems reasonable to me, as there are no known use cases for >> `distinct_count` in a single data file. >> >> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote: >> >> My mistake, I suggested sending out an email with a quick vote on the PR. >> I like the suggestion to use this thread for discussion since the number of >> options is limited. >> >> I'm in favor of deprecating the field, to avoid that we re-use the >> field-id in the future. >> >> Kind regards, >> Fokko >> >> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com>: >> >> Hi Jacob, >> >> Thanks for initiating the vote. >> Typically, we would first have a DISCUSSION thread to reach a consensus >> on the preferred option and then follow it up with a VOTE thread for >> confirmation. >> >> Maybe we can take this as a DISCUSSION thread? >> >> Best, >> Manu >> >> >> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble >> <jacobmar...@firetiger.com.invalid> wrote: >> >> This vote will be open for at least 72 hours. >> >> I propose that distinct_counts be either deprecated (#12182 >> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183 >> <https://github.com/apache/iceberg/pull/12183>) from the spec. >> >> According to #767 <https://github.com/apache/iceberg/issues/767> >> data_file.distinct_counts was deprecated about four years ago. Furthermore, >> it not implemented in the canonical Java and Python implementations >> >> Please share your thoughts, and vote one of the following: >> - remove >> - deprecate >> - no-op >> >> Jacob Marble >> 🔥🐅 >> >> Xuanwo >> >> https://xuanwo.io/ >> >>