+1 to deprecate it again and remove it later on. I did some digging and found out that Dremio was interested in this field for secondary indexes. https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn
But we didn't make progress on that. - Ajantha On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell <scowell.0...@gmail.com> wrote: > Speaking for Dremio, I checked and we're not using distinct_counts > anywhere, we interact with manifests exclusively through the Iceberg Java > API which as mentioned doesn't support this field. I'm in favor of > removing it, I didn't even know it existed as I tend to look at the Java > DataFile/ContentFile interfaces when browsing the metadata structure vs. > going to the spec 😂 > > > On Mon, Feb 24, 2025 at 3:00 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > >> I can provide some context here. The field is very old and when we >> realized that it was not only unused but also difficult to produce and use >> in practice (can't be combined) we deprecated the field. However, some >> folks from Dremio wanted to bring it back because they said they could >> store values there and had a way to use them. >> >> +1, but it would be good to check in with some Dremio engineers and see >> if they are using it. I assume they aren't since this thread hasn't gotten >> much attention. Thanks for bringing this up! >> >> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble >> <jacobmar...@firetiger.com.invalid> wrote: >> >>> Xuanwo, do you favor deprecating or removing `distinct_count`? >>> >>> Due to lack of any real implementation, I myself favor removal (PR >>> 12183). >>> >>> Jacob Marble >>> 🔥🐅 >>> >>> >>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote: >>> >>>> Here is my +1 binding. >>>> >>>> The current status of `distinct_count` is quite confusing, which has >>>> also led to additional discussions in `iceberg-rust` about whether we need >>>> to add it and how to maintain it. >>>> >>>> Removing it seems reasonable to me, as there are no known use cases for >>>> `distinct_count` in a single data file. >>>> >>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote: >>>> >>>> My mistake, I suggested sending out an email with a quick vote on the >>>> PR. I like the suggestion to use this thread for discussion since the >>>> number of options is limited. >>>> >>>> I'm in favor of deprecating the field, to avoid that we re-use the >>>> field-id in the future. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com >>>> >: >>>> >>>> Hi Jacob, >>>> >>>> Thanks for initiating the vote. >>>> Typically, we would first have a DISCUSSION thread to reach a consensus >>>> on the preferred option and then follow it up with a VOTE thread for >>>> confirmation. >>>> >>>> Maybe we can take this as a DISCUSSION thread? >>>> >>>> Best, >>>> Manu >>>> >>>> >>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble >>>> <jacobmar...@firetiger.com.invalid> wrote: >>>> >>>> This vote will be open for at least 72 hours. >>>> >>>> I propose that distinct_counts be either deprecated (#12182 >>>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183 >>>> <https://github.com/apache/iceberg/pull/12183>) from the spec. >>>> >>>> According to #767 <https://github.com/apache/iceberg/issues/767> >>>> data_file.distinct_counts was deprecated about four years ago. Furthermore, >>>> it not implemented in the canonical Java and Python implementations >>>> >>>> Please share your thoughts, and vote one of the following: >>>> - remove >>>> - deprecate >>>> - no-op >>>> >>>> Jacob Marble >>>> 🔥🐅 >>>> >>>> Xuanwo >>>> >>>> https://xuanwo.io/ >>>> >>>>