xiedeyantu opened a new pull request, #21715:
URL: https://github.com/apache/datafusion/pull/21715

   ## Which issue does this PR close?
   
   - Closes #21507
   
   ## Rationale for this change
   
   `UNIQUE` constraints can contain multiple `NULL` values, so they do not 
guarantee row-level uniqueness in SQL semantics. The optimizer was incorrectly 
treating nullable unique constraints as functional dependencies that could 
reduce GROUP BY keys, which collapsed distinct `NULL` rows into a single group.
   
   ## What changes are included in this PR?
   
   This PR updates functional-dependency handling so nullable dependencies 
derived from `UNIQUE` constraints are not used to eliminate GROUP BY 
expressions. It also adds a regression test covering the `NULL` case from the 
issue report.
   
   ## Are these changes tested?
   
   Yes. I ran:
   - `cargo fmt --all`
   - `cargo clippy -p datafusion-common --all-targets -- -D warnings`
   - `cargo test -p datafusion-common functional_dependencies`
   - `cargo test -p datafusion-sqllogictest --test sqllogictests -- group_by`
   
   ## Are there any user-facing changes?
   
   Yes. Queries that group by nullable `UNIQUE` columns will no longer return 
incorrect aggregated results when multiple `NULL` values are present.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to