Speaking for Dremio, I checked and we're not using distinct_counts
anywhere, we interact with manifests exclusively through the Iceberg Java
API which as mentioned doesn't support this field.    I'm in favor of
removing it, I didn't even know it existed as I tend to look at the Java
DataFile/ContentFile interfaces when browsing the metadata structure vs.
going to the spec 😂


On Mon, Feb 24, 2025 at 3:00 PM rdb...@gmail.com <rdb...@gmail.com> wrote:

> I can provide some context here. The field is very old and when we
> realized that it was not only unused but also difficult to produce and use
> in practice (can't be combined) we deprecated the field. However, some
> folks from Dremio wanted to bring it back because they said they could
> store values there and had a way to use them.
>
> +1, but it would be good to check in with some Dremio engineers and see if
> they are using it. I assume they aren't since this thread hasn't gotten
> much attention. Thanks for bringing this up!
>
> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
> <jacobmar...@firetiger.com.invalid> wrote:
>
>> Xuanwo, do you favor deprecating or removing `distinct_count`?
>>
>> Due to lack of any real implementation, I myself favor removal (PR 12183).
>>
>> Jacob Marble
>> 🔥🐅
>>
>>
>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote:
>>
>>> Here is my +1 binding.
>>>
>>> The current status of `distinct_count` is quite confusing, which has
>>> also led to additional discussions in `iceberg-rust` about whether we need
>>> to add it and how to maintain it.
>>>
>>> Removing it seems reasonable to me, as there are no known use cases for
>>> `distinct_count` in a single data file.
>>>
>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>>
>>> My mistake, I suggested sending out an email with a quick vote on the
>>> PR. I like the suggestion to use this thread for discussion since the
>>> number of options is limited.
>>>
>>> I'm in favor of deprecating the field, to avoid that we re-use the
>>> field-id in the future.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com>:
>>>
>>> Hi Jacob,
>>>
>>> Thanks for initiating the vote.
>>> Typically, we would first have a DISCUSSION thread to reach a consensus
>>> on the preferred option and then follow it up with a VOTE thread for
>>> confirmation.
>>>
>>> Maybe we can take this as a DISCUSSION thread?
>>>
>>> Best,
>>> Manu
>>>
>>>
>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>> <jacobmar...@firetiger.com.invalid> wrote:
>>>
>>> This vote will be open for at least 72 hours.
>>>
>>> I propose that distinct_counts be either deprecated (#12182
>>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183
>>> <https://github.com/apache/iceberg/pull/12183>) from the spec.
>>>
>>> According to #767 <https://github.com/apache/iceberg/issues/767>
>>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>>> it not implemented in the canonical Java and Python implementations
>>>
>>> Please share your thoughts, and vote one of the following:
>>> - remove
>>> - deprecate
>>> - no-op
>>>
>>> Jacob Marble
>>> 🔥🐅
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>
>>>

Reply via email to