Thanks, I will have a PR to update the implementation. On Tue, Feb 4, 2025 at 8:26 AM Anton Okolnychyi <aokolnyc...@gmail.com> wrote:
> Oh, I think we actually missed to handle that particular count correctly. > The total number of records must stay optional as it is expensive to > compute with V2 delete files. In V3, however, we can reliably use DV > cardinalities to populate that field without scanning data if there are no > equality deletes and no existing V2 delete files. That said, it should > still be optional and NULL should mean unknown. > > пн, 3 лют. 2025 р. о 04:52 Ajantha Bhat <ajanthab...@gmail.com> пише: > >> SGTM, >> >> Along with those counters, we should also update the `total_record_count` >> as during implementation we decided to keep all counter behavior the same >> if not computed. >> >> - Ajantha >> >> >> On Sat, Feb 1, 2025 at 2:38 PM Russell Spitzer <russell.spit...@gmail.com> >> wrote: >> >>> Sounds reasonable, I think the intent was that N/A is different then 0 >>> but that only makes sense for V1. For V2/V3 0 makes sense >>> >>> On Sat, Feb 1, 2025 at 3:15 AM Anton Okolnychyi <aokolnyc...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> I propose to clarify our delete counts handling in partition stats. We >>>> have the following metrics that are marked as optional: >>>> >>>> - position_delete_record_count >>>> - position_delete_file_count >>>> - equality_delete_record_count >>>> - equality_delete_file_count >>>> >>>> If I remember correctly, the reasoning behind this was that tables may >>>> have no deletes, hence the counts are optional. The problem is that it >>>> creates confusion for readers. Does null mean unknown or absent? I propose >>>> we clarify that no counts means 0 for V1/V2 tables in the spec (this is the >>>> current behavior in the Java implementation) and make the counts required >>>> in V3. >>>> >>>> What does everybody think? >>>> >>>> - Anton >>>> >>>