[PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

via GitHub Thu, 19 Mar 2026 09:29:55 -0700


xiedeyantu opened a new pull request, #21058:
URL: https://github.com/apache/datafusion/pull/21058


   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   `GROUPING SETS` with duplicate grouping lists were incorrectly collapsed 
during execution. The internal grouping id only encoded the null mask, so 
repeated grouping sets shared the same key and were merged, which caused rows 
to be lost compared with PostgreSQL behavior.
   
   I verified the expected result in PostgreSQL, where the same query returns 
30 rows for the full `emp` example instead of being collapsed.
   
   ## What changes are included in this PR?
   
   - Preserve duplicate grouping sets by adding a per-occurrence ordinal into 
the internal grouping id used during execution.
   - Keep `GROUPING()` semantics unchanged.
   - Add regression coverage for the duplicate `GROUPING SETS` case in:
     - `datafusion/core/tests/sql/aggregates/basic.rs`
     - `datafusion/sqllogictest/test_files/group_by.slt`
   
   ## Are these changes tested?
   
   - `cargo fmt --all`
   - `cargo test -p datafusion duplicate_grouping_sets_are_preserved -- 
--nocapture`
   - PostgreSQL validation against the same query/result shape
   
   ## Are there any user-facing changes?
   
   - Yes. Queries that contain duplicate `GROUPING SETS` entries now return the 
correct duplicated result rows, matching PostgreSQL behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: preserve duplicate GROUPING SETS rows [datafusion]

Reply via email to