alamb commented on code in PR #20858:
URL: https://github.com/apache/datafusion/pull/20858#discussion_r2917570430


##########
.github/workflows/extended.yml:
##########
@@ -154,11 +154,16 @@ jobs:
         uses: ./.github/actions/setup-builder
         with:
           rust-version: stable
-      - name: Run tests
+      - name: Run tests (force_hash_collisions)
         run: |
           cd datafusion
           cargo test  --profile ci --exclude datafusion-examples --exclude 
datafusion-benchmarks --exclude datafusion-sqllogictest --exclude 
datafusion-cli --workspace --lib --tests --features=force_hash_collisions,avro
           cargo clean
+      - name: Run tests (force_hash_partial_collisions, #20724)

Review Comment:
   Given how long our CI already runs for we probably want to avoid a new 
target (this will recompile a bunch of DataFusion even though it only runs one 
test)
   
   Another technique we might use to add coverage / find a reproducer here is 
fuzzing



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1267,6 +1267,18 @@ impl GroupedHashAggregateStream {
             // on the grouping columns.
             self.group_ordering = 
GroupOrdering::Full(GroupOrderingFull::new());
 
+            // Recreate group_values to use streaming mode 
(GroupValuesColumn<true>
+            // with scalarized_intern) which preserves input row order, as 
required
+            // by GroupOrderingFull. This is only needed for multi-column 
group by,
+            // since single-column uses GroupValuesPrimitive which is always 
safe.

Review Comment:
   ```suggestion
               // Recreate group_values as  streaming mode with multiple 
columns 
               // (GroupOrderingFull / GroupValuesColumn<true>)
               // assumes the ids are assigned in first-seen order.  
               // single-column grouping uses GroupValuesPrimitive which 
doesn't make
               // the same assumption
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to