alamb commented on code in PR #20858:
URL: https://github.com/apache/datafusion/pull/20858#discussion_r2917570430
##########
.github/workflows/extended.yml:
##########
@@ -154,11 +154,16 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- - name: Run tests
+ - name: Run tests (force_hash_collisions)
run: |
cd datafusion
cargo test --profile ci --exclude datafusion-examples --exclude
datafusion-benchmarks --exclude datafusion-sqllogictest --exclude
datafusion-cli --workspace --lib --tests --features=force_hash_collisions,avro
cargo clean
+ - name: Run tests (force_hash_partial_collisions, #20724)
Review Comment:
Given how long our CI already runs for we probably want to avoid a new
target (this will recompile a bunch of DataFusion even though it only runs one
test)
Another technique we might use to add coverage / find a reproducer here is
fuzzing
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1267,6 +1267,18 @@ impl GroupedHashAggregateStream {
// on the grouping columns.
self.group_ordering =
GroupOrdering::Full(GroupOrderingFull::new());
+ // Recreate group_values to use streaming mode
(GroupValuesColumn<true>
+ // with scalarized_intern) which preserves input row order, as
required
+ // by GroupOrderingFull. This is only needed for multi-column
group by,
+ // since single-column uses GroupValuesPrimitive which is always
safe.
Review Comment:
```suggestion
// Recreate group_values as streaming mode with multiple
columns
// (GroupOrderingFull / GroupValuesColumn<true>)
// assumes the ids are assigned in first-seen order.
// single-column grouping uses GroupValuesPrimitive which
doesn't make
// the same assumption
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]