+1 to deprecate it again and remove it later on.

I did some digging and found out that Dremio was interested in this field
for secondary indexes.
https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn

But we didn't make progress on that.

- Ajantha

On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell <scowell.0...@gmail.com> wrote:

> Speaking for Dremio, I checked and we're not using distinct_counts
> anywhere, we interact with manifests exclusively through the Iceberg Java
> API which as mentioned doesn't support this field.    I'm in favor of
> removing it, I didn't even know it existed as I tend to look at the Java
> DataFile/ContentFile interfaces when browsing the metadata structure vs.
> going to the spec 😂
>
>
> On Mon, Feb 24, 2025 at 3:00 PM rdb...@gmail.com <rdb...@gmail.com> wrote:
>
>> I can provide some context here. The field is very old and when we
>> realized that it was not only unused but also difficult to produce and use
>> in practice (can't be combined) we deprecated the field. However, some
>> folks from Dremio wanted to bring it back because they said they could
>> store values there and had a way to use them.
>>
>> +1, but it would be good to check in with some Dremio engineers and see
>> if they are using it. I assume they aren't since this thread hasn't gotten
>> much attention. Thanks for bringing this up!
>>
>> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
>> <jacobmar...@firetiger.com.invalid> wrote:
>>
>>> Xuanwo, do you favor deprecating or removing `distinct_count`?
>>>
>>> Due to lack of any real implementation, I myself favor removal (PR
>>> 12183).
>>>
>>> Jacob Marble
>>> 🔥🐅
>>>
>>>
>>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo <xua...@apache.org> wrote:
>>>
>>>> Here is my +1 binding.
>>>>
>>>> The current status of `distinct_count` is quite confusing, which has
>>>> also led to additional discussions in `iceberg-rust` about whether we need
>>>> to add it and how to maintain it.
>>>>
>>>> Removing it seems reasonable to me, as there are no known use cases for
>>>> `distinct_count` in a single data file.
>>>>
>>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>>>
>>>> My mistake, I suggested sending out an email with a quick vote on the
>>>> PR. I like the suggestion to use this thread for discussion since the
>>>> number of options is limited.
>>>>
>>>> I'm in favor of deprecating the field, to avoid that we re-use the
>>>> field-id in the future.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang <owenzhang1...@gmail.com
>>>> >:
>>>>
>>>> Hi Jacob,
>>>>
>>>> Thanks for initiating the vote.
>>>> Typically, we would first have a DISCUSSION thread to reach a consensus
>>>> on the preferred option and then follow it up with a VOTE thread for
>>>> confirmation.
>>>>
>>>> Maybe we can take this as a DISCUSSION thread?
>>>>
>>>> Best,
>>>> Manu
>>>>
>>>>
>>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>>> <jacobmar...@firetiger.com.invalid> wrote:
>>>>
>>>> This vote will be open for at least 72 hours.
>>>>
>>>> I propose that distinct_counts be either deprecated (#12182
>>>> <https://github.com/apache/iceberg/pull/12182>) or removed (#12183
>>>> <https://github.com/apache/iceberg/pull/12183>) from the spec.
>>>>
>>>> According to #767 <https://github.com/apache/iceberg/issues/767>
>>>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>>>> it not implemented in the canonical Java and Python implementations
>>>>
>>>> Please share your thoughts, and vote one of the following:
>>>> - remove
>>>> - deprecate
>>>> - no-op
>>>>
>>>> Jacob Marble
>>>> 🔥🐅
>>>>
>>>> Xuanwo
>>>>
>>>> https://xuanwo.io/
>>>>
>>>>

Reply via email to