Here is my +1 binding. The current status of `distinct_count` is quite confusing, which has also led to additional discussions in `iceberg-rust` about whether we need to add it and how to maintain it.
Removing it seems reasonable to me, as there are no known use cases for `distinct_count` in a single data file. On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote: > My mistake, I suggested sending out an email with a quick vote on the PR. I > like the suggestion to use this thread for discussion since the number of > options is limited. > > I'm in favor of deprecating the field, to avoid that we re-use the field-id > in the future. > > Kind regards, > Fokko > > Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com>: >> Hi Jacob, >> >> Thanks for initiating the vote. >> Typically, we would first have a DISCUSSION thread to reach a consensus on >> the preferred option and then follow it up with a VOTE thread for >> confirmation. >> >> Maybe we can take this as a DISCUSSION thread? >> >> Best, >> Manu >> >> >> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble >> <jacobmar...@firetiger.com.invalid> wrote: >>> This vote will be open for at least 72 hours. >>> >>> I propose that distinct_counts be either deprecated (#12182 >>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183 >>> <https://github.com/apache/iceberg/pull/12183>) from the spec. >>> >>> According to #767 <https://github.com/apache/iceberg/issues/767> >>> data_file.distinct_counts was deprecated about four years ago. Furthermore, >>> it not implemented in the canonical Java and Python implementations >>> >>> Please share your thoughts, and vote one of the following: >>> - remove >>> - deprecate >>> - no-op >>> >>> Jacob Marble >>> 🔥🐅 Xuanwo https://xuanwo.io/