asolimando opened a new pull request, #21077: URL: https://github.com/apache/datafusion/pull/21077
## Which issue does this PR close? - Part of #20766 Related: #20789 (uses NDV for equality filter selectivity, complementary - this PR improves the NDV output stats, that PR consumes them) ## Rationale for this change When a filter predicate collapses a column interval to a single value (e.g. `d_qoy = 1`), the output column can only have one distinct value. Currently `distinct_count` is always demoted to `Inexact`, losing this information. This matters for downstream optimizers that rely on `distinct_count`, such as join cardinality estimation in `estimate_inner_join_cardinality`. ## What changes are included in this PR? In `collect_new_statistics` (filter.rs), when the post-filter interval has `lower == upper` (both non-null), set `distinct_count` to `Precision::Exact(1)` instead of demoting the input NDV to `Inexact`. ## Are these changes tested? Yes, 4 unit tests: - Equality predicate (`a = 42`) -> NDV becomes `Exact(1)` - OR predicate (`a = 42 OR a = 22`) -> interval does not collapse, NDV stays `Inexact` - AND with mixed predicates (`a = 42 AND b > 10`) -> `a` gets `Exact(1)`, `b` stays `Inexact` - Equality with absent bounds (`a = 42`, no min/max) -> interval analysis still resolves to `Exact(1)` ## Are there any user-facing changes? No breaking changes. Statistics consumers will now see `Exact(1)` for `distinct_count` on columns constrained to a single value by filter predicates. Disclaimer: I used AI to assist in the code generation, I have manually reviewed the output and it matches my intention and understanding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
