Thanks, I will have a PR to update the implementation.

On Tue, Feb 4, 2025 at 8:26 AM Anton Okolnychyi <aokolnyc...@gmail.com>
wrote:

> Oh, I think we actually missed to handle that particular count correctly.
> The total number of records must stay optional as it is expensive to
> compute with V2 delete files. In V3, however, we can reliably use DV
> cardinalities to populate that field without scanning data if there are no
> equality deletes and no existing V2 delete files. That said, it should
> still be optional and NULL should mean unknown.
>
> пн, 3 лют. 2025 р. о 04:52 Ajantha Bhat <ajanthab...@gmail.com> пише:
>
>> SGTM,
>>
>> Along with those counters, we should also update the `total_record_count`
>> as during implementation we decided to keep all counter behavior the same
>> if not computed.
>>
>> - Ajantha
>>
>>
>> On Sat, Feb 1, 2025 at 2:38 PM Russell Spitzer <russell.spit...@gmail.com>
>> wrote:
>>
>>> Sounds reasonable, I think the intent was that N/A is different then 0
>>> but that only makes sense for V1. For V2/V3 0 makes sense
>>>
>>> On Sat, Feb 1, 2025 at 3:15 AM Anton Okolnychyi <aokolnyc...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I propose to clarify our delete counts handling in partition stats. We
>>>> have the following metrics that are marked as optional:
>>>>
>>>> - position_delete_record_count
>>>> - position_delete_file_count
>>>> - equality_delete_record_count
>>>> - equality_delete_file_count
>>>>
>>>> If I remember correctly, the reasoning behind this was that tables may
>>>> have no deletes, hence the counts are optional. The problem is that it
>>>> creates confusion for readers. Does null mean unknown or absent? I propose
>>>> we clarify that no counts means 0 for V1/V2 tables in the spec (this is the
>>>> current behavior in the Java implementation) and make the counts required
>>>> in V3.
>>>>
>>>> What does everybody think?
>>>>
>>>> - Anton
>>>>
>>>

Reply via email to