Thanks everyone. It seems like there is a consensus, and I'll go ahead and mark the field as deprecated for now to avoid any future confusion.
Kind regards, Fokko Op di 25 feb 2025 om 00:54 schreef Ajantha Bhat <ajanthab...@gmail.com>: > +1 to deprecate it again and remove it later on. > > I did some digging and found out that Dremio was interested in this field > for secondary indexes. > https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn > > But we didn't make progress on that. > > - Ajantha > > On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell <scowell.0...@gmail.com> > wrote: > >> Speaking for Dremio, I checked and we're not using distinct_counts >> anywhere, we interact with manifests exclusively through the Iceberg Java >> API which as mentioned doesn't support this field. I'm in favor of >> removing it, I didn't even know it existed as I tend to look at the Java >> DataFile/ContentFile interfaces when browsing the metadata structure vs. >> going to the spec 😂 >> >> >> On Mon, Feb 24, 2025 at 3:00 PM rdb...@gmail.com <rdb...@gmail.com> >> wrote: >> >>> I can provide some context here. The field is very old and when we >>> realized that it was not only unused but also difficult to produce and use >>> in practice (can't be combined) we deprecated the field. However, some >>> folks from Dremio wanted to bring it back because they said they could >>> store values there and had a way to use them. >>> >>> +1, but it would be good to check in with some Dremio engineers and see >>> if they are using it. I assume they aren't since this thread hasn't gotten >>> much attention. Thanks for bringing this up! >>> >>> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble >>> <jacobmar...@firetiger.com.invalid> wrote: >>> >>>> Xuanwo, do you favor deprecating or removing `distinct_count`? >>>> >>>> Due to lack of any real implementation, I myself favor removal (PR >>>> 12183). >>>> >>>> Jacob Marble >>>> 🔥🐅 >>>> >>>> >>>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote: >>>> >>>>> Here is my +1 binding. >>>>> >>>>> The current status of `distinct_count` is quite confusing, which has >>>>> also led to additional discussions in `iceberg-rust` about whether we need >>>>> to add it and how to maintain it. >>>>> >>>>> Removing it seems reasonable to me, as there are no known use cases >>>>> for `distinct_count` in a single data file. >>>>> >>>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote: >>>>> >>>>> My mistake, I suggested sending out an email with a quick vote on the >>>>> PR. I like the suggestion to use this thread for discussion since the >>>>> number of options is limited. >>>>> >>>>> I'm in favor of deprecating the field, to avoid that we re-use the >>>>> field-id in the future. >>>>> >>>>> Kind regards, >>>>> Fokko >>>>> >>>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com >>>>> >: >>>>> >>>>> Hi Jacob, >>>>> >>>>> Thanks for initiating the vote. >>>>> Typically, we would first have a DISCUSSION thread to reach a >>>>> consensus on the preferred option and then follow it up with a VOTE thread >>>>> for confirmation. >>>>> >>>>> Maybe we can take this as a DISCUSSION thread? >>>>> >>>>> Best, >>>>> Manu >>>>> >>>>> >>>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble >>>>> <jacobmar...@firetiger.com.invalid> wrote: >>>>> >>>>> This vote will be open for at least 72 hours. >>>>> >>>>> I propose that distinct_counts be either deprecated (#12182 >>>>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183 >>>>> <https://github.com/apache/iceberg/pull/12183>) from the spec. >>>>> >>>>> According to #767 <https://github.com/apache/iceberg/issues/767> >>>>> data_file.distinct_counts was deprecated about four years ago. >>>>> Furthermore, >>>>> it not implemented in the canonical Java and Python implementations >>>>> >>>>> Please share your thoughts, and vote one of the following: >>>>> - remove >>>>> - deprecate >>>>> - no-op >>>>> >>>>> Jacob Marble >>>>> 🔥🐅 >>>>> >>>>> Xuanwo >>>>> >>>>> https://xuanwo.io/ >>>>> >>>>>