kosiew opened a new pull request, #17286: URL: https://github.com/apache/datafusion/pull/17286
## Which issue does this PR close? * Closes #17280. ## Rationale for this change The accumulator previously collected build-side partition bounds and then **sorted** them with `sorted_by_key`, which: * Introduced **extra allocations** and * Added **O(n log n)** overhead on the number of completed partitions. Since partitions already have stable IDs, we can **pre-index** bounds by partition ID and avoid sorting entirely. This makes dynamic filter construction **O(n)** with fewer allocations, improves predictability, and eliminates a source of nondeterminism tied to completion order. ## What changes are included in this PR? * Replaced `PartitionBounds` + `sorted_by_key` with a **preallocated `Vec<Option<Vec<ColumnBounds>>>`** indexed by partition ID. * Eliminated sorting and the dependency on `itertools`, reducing allocations and algorithmic overhead. * Updated accumulator logic to: * **Bounds insertion in O(1)** at the correct index (by partition ID). * Validate out-of-range partition IDs and return a clear internal error instead of panicking. * Build the dynamic filter once **all partitions have reported**, ignoring missing partitions. * Adjusted `create_filter_from_partition_bounds` to iterate the fixed-index vector and construct predicates without any intermediate sorting/allocation. * Kept/clarified determinism as a by-product: completion order no longer affects the resulting predicate. ## Are these changes tested? Yes. * Added an async test `test_hashjoin_dynamic_filter_pushdown_out_of_order` that intentionally reverses completion order of build-side partitions across runs and asserts the resulting dynamic filter predicate string is identical, proving order independence while validating logic. * Existing join and dynamic filter tests continue to pass. ## Are there any user-facing changes? No API-breaking changes. * Internals of dynamic filter construction were optimized for efficiency and determinism. * Query semantics remain unchanged, but performance improves due to reduced allocations and removal of sorting overhead. --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org