asolimando opened a new pull request, #21077:
URL: https://github.com/apache/datafusion/pull/21077

   ## Which issue does this PR close?
   
   - Part of #20766
   
   Related: #20789 (uses NDV for equality filter selectivity, complementary - 
this PR improves the NDV output stats, that PR consumes them)
   
   ## Rationale for this change
   
   When a filter predicate collapses a column interval to a single value (e.g. 
`d_qoy = 1`), the output column can only have one distinct value. Currently 
`distinct_count` is always demoted to `Inexact`, losing this information.
   
   This matters for downstream optimizers that rely on `distinct_count`, such 
as join cardinality estimation in `estimate_inner_join_cardinality`.
   
   ## What changes are included in this PR?
   
   In `collect_new_statistics` (filter.rs), when the post-filter interval has 
`lower == upper` (both non-null), set `distinct_count` to `Precision::Exact(1)` 
instead of demoting the input NDV to `Inexact`.
   
   ## Are these changes tested?
   
   Yes, 4 unit tests:
   - Equality predicate (`a = 42`) -> NDV becomes `Exact(1)`
   - OR predicate (`a = 42 OR a = 22`) -> interval does not collapse, NDV stays 
`Inexact`
   - AND with mixed predicates (`a = 42 AND b > 10`) -> `a` gets `Exact(1)`, 
`b` stays `Inexact`
   - Equality with absent bounds (`a = 42`, no min/max) -> interval analysis 
still resolves to `Exact(1)`
   
   ## Are there any user-facing changes?
   
   No breaking changes. Statistics consumers will now see `Exact(1)` for 
`distinct_count` on columns constrained to a single value by filter predicates.
   
   Disclaimer: I used AI to assist in the code generation, I have manually 
reviewed the output and it matches my intention and understanding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to