Oh, I think we actually missed to handle that particular count correctly.
The total number of records must stay optional as it is expensive to
compute with V2 delete files. In V3, however, we can reliably use DV
cardinalities to populate that field without scanning data if there are no
equality deletes and no existing V2 delete files. That said, it should
still be optional and NULL should mean unknown.

пн, 3 лют. 2025 р. о 04:52 Ajantha Bhat <ajanthab...@gmail.com> пише:

> SGTM,
>
> Along with those counters, we should also update the `total_record_count`
> as during implementation we decided to keep all counter behavior the same
> if not computed.
>
> - Ajantha
>
>
> On Sat, Feb 1, 2025 at 2:38 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> Sounds reasonable, I think the intent was that N/A is different then 0
>> but that only makes sense for V1. For V2/V3 0 makes sense
>>
>> On Sat, Feb 1, 2025 at 3:15 AM Anton Okolnychyi <aokolnyc...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I propose to clarify our delete counts handling in partition stats. We
>>> have the following metrics that are marked as optional:
>>>
>>> - position_delete_record_count
>>> - position_delete_file_count
>>> - equality_delete_record_count
>>> - equality_delete_file_count
>>>
>>> If I remember correctly, the reasoning behind this was that tables may
>>> have no deletes, hence the counts are optional. The problem is that it
>>> creates confusion for readers. Does null mean unknown or absent? I propose
>>> we clarify that no counts means 0 for V1/V2 tables in the spec (this is the
>>> current behavior in the Java implementation) and make the counts required
>>> in V3.
>>>
>>> What does everybody think?
>>>
>>> - Anton
>>>
>>

Reply via email to