Thanks everyone. It seems like there is a consensus, and I'll go ahead and
mark the field as deprecated for now to avoid any future confusion.

Kind regards,
Fokko

Op di 25 feb 2025 om 00:54 schreef Ajantha Bhat <ajanthab...@gmail.com>:

> +1 to deprecate it again and remove it later on.
>
> I did some digging and found out that Dremio was interested in this field
> for secondary indexes.
> https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn
>
> But we didn't make progress on that.
>
> - Ajantha
>
> On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell <scowell.0...@gmail.com>
> wrote:
>
>> Speaking for Dremio, I checked and we're not using distinct_counts
>> anywhere, we interact with manifests exclusively through the Iceberg Java
>> API which as mentioned doesn't support this field.    I'm in favor of
>> removing it, I didn't even know it existed as I tend to look at the Java
>> DataFile/ContentFile interfaces when browsing the metadata structure vs.
>> going to the spec 😂
>>
>>
>> On Mon, Feb 24, 2025 at 3:00 PM rdb...@gmail.com <rdb...@gmail.com>
>> wrote:
>>
>>> I can provide some context here. The field is very old and when we
>>> realized that it was not only unused but also difficult to produce and use
>>> in practice (can't be combined) we deprecated the field. However, some
>>> folks from Dremio wanted to bring it back because they said they could
>>> store values there and had a way to use them.
>>>
>>> +1, but it would be good to check in with some Dremio engineers and see
>>> if they are using it. I assume they aren't since this thread hasn't gotten
>>> much attention. Thanks for bringing this up!
>>>
>>> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
>>> <jacobmar...@firetiger.com.invalid> wrote:
>>>
>>>> Xuanwo, do you favor deprecating or removing `distinct_count`?
>>>>
>>>> Due to lack of any real implementation, I myself favor removal (PR
>>>> 12183).
>>>>
>>>> Jacob Marble
>>>> 🔥🐅
>>>>
>>>>
>>>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote:
>>>>
>>>>> Here is my +1 binding.
>>>>>
>>>>> The current status of `distinct_count` is quite confusing, which has
>>>>> also led to additional discussions in `iceberg-rust` about whether we need
>>>>> to add it and how to maintain it.
>>>>>
>>>>> Removing it seems reasonable to me, as there are no known use cases
>>>>> for `distinct_count` in a single data file.
>>>>>
>>>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>>>>
>>>>> My mistake, I suggested sending out an email with a quick vote on the
>>>>> PR. I like the suggestion to use this thread for discussion since the
>>>>> number of options is limited.
>>>>>
>>>>> I'm in favor of deprecating the field, to avoid that we re-use the
>>>>> field-id in the future.
>>>>>
>>>>> Kind regards,
>>>>> Fokko
>>>>>
>>>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com
>>>>> >:
>>>>>
>>>>> Hi Jacob,
>>>>>
>>>>> Thanks for initiating the vote.
>>>>> Typically, we would first have a DISCUSSION thread to reach a
>>>>> consensus on the preferred option and then follow it up with a VOTE thread
>>>>> for confirmation.
>>>>>
>>>>> Maybe we can take this as a DISCUSSION thread?
>>>>>
>>>>> Best,
>>>>> Manu
>>>>>
>>>>>
>>>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>>>> <jacobmar...@firetiger.com.invalid> wrote:
>>>>>
>>>>> This vote will be open for at least 72 hours.
>>>>>
>>>>> I propose that distinct_counts be either deprecated (#12182
>>>>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183
>>>>> <https://github.com/apache/iceberg/pull/12183>) from the spec.
>>>>>
>>>>> According to #767 <https://github.com/apache/iceberg/issues/767>
>>>>> data_file.distinct_counts was deprecated about four years ago. 
>>>>> Furthermore,
>>>>> it not implemented in the canonical Java and Python implementations
>>>>>
>>>>> Please share your thoughts, and vote one of the following:
>>>>> - remove
>>>>> - deprecate
>>>>> - no-op
>>>>>
>>>>> Jacob Marble
>>>>> 🔥🐅
>>>>>
>>>>> Xuanwo
>>>>>
>>>>> https://xuanwo.io/
>>>>>
>>>>>

Reply via email to