Lordworms commented on code in PR #13054:
URL: https://github.com/apache/datafusion/pull/13054#discussion_r1833251081
##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -1406,12 +1403,24 @@ impl HashJoinStream {
self.hashes_buffer.resize(batch.num_rows(), 0);
create_hashes(&keys_values, &self.random_state, &mut
self.hashes_buffer)?;
+ let (filtered_batch, filtered_hashes) =
+ if let Some(dynamic_filter) = &self.dynamic_filter_info {
+ dynamic_filter.filter_probe_batch(
Review Comment:
> Hmm, `filter_probe_batch` shouldn't be added as filter on the `hash_join`
but rather as `PhysicalExpr` (returning boolean) to filter the `ParquetExec`
Sure, I'll refactor that, other than that I found a problem with ParquetExec
after rebase the main https://github.com/apache/datafusion/issues/13298. If the
pushdown cost is much more than do it directly. I am not sure whether it is a
'optimization' or not....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]