asolimando commented on PR #19957: URL: https://github.com/apache/datafusion/pull/19957#issuecomment-4046213633
Rebased on latest `main` (includes #20846) and addressed all pending review comments (new changes are in the 4 topmost commits only, the rest is untouched modulo SHA changes for the rebase). @xudong963: replaced the `max(left, right)` merge heuristic with the overlap-based formula from #20846 ([comment with details](https://github.com/apache/datafusion/pull/19957#discussion_r2924005097)) @gene-bordegaray: - Reverted NDV propagation in projections as suggested, will address in a follow-up with proper expression statistics design ([comment with details](https://github.com/apache/datafusion/pull/19957#discussion_r2923949231)) - Added 75% threshold for partial NDV extraction from Parquet: if fewer than 75% of row groups report NDV, return `Absent` ([comment with details](https://github.com/apache/datafusion/pull/19957#discussion_r2923969032)) - The `max()` merge rule concern is also addressed by the overlap formula above @jonathanc-n: partial NDV threshold addresses your [comment with details](https://github.com/apache/datafusion/pull/19957#discussion_r2897575666) as well. Looking forward to hearing your thoughts! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
