asolimando commented on PR #19957:
URL: https://github.com/apache/datafusion/pull/19957#issuecomment-4046213633

   Rebased on latest `main` (includes #20846) and addressed all pending review 
comments (new changes are in the 4 topmost commits only, the rest is untouched 
modulo SHA changes for the rebase).
   
   @xudong963: replaced the `max(left, right)` merge heuristic with the 
overlap-based formula from #20846 ([comment with 
details](https://github.com/apache/datafusion/pull/19957#discussion_r2924005097))
   
   @gene-bordegaray:
   - Reverted NDV propagation in projections as suggested, will address in a 
follow-up with proper expression statistics design ([comment with 
details](https://github.com/apache/datafusion/pull/19957#discussion_r2923949231))
   - Added 75% threshold for partial NDV extraction from Parquet: if fewer than 
75% of row groups report NDV, return `Absent` ([comment with 
details](https://github.com/apache/datafusion/pull/19957#discussion_r2923969032))
   - The `max()` merge rule concern is also addressed by the overlap formula 
above
   
   @jonathanc-n: partial NDV threshold addresses your [comment with 
details](https://github.com/apache/datafusion/pull/19957#discussion_r2897575666)
 as well.
   
   Looking forward to hearing your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to