adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2763043386
@alamb I've achieved 2/3 goals: - I added wrapping of a `DynamicFilterSource` in a `PhysicalExpr` such that it can dynamically update itself to prune rows using filter pushdown _even on a single file_. After various experiments I went with https://github.com/apache/datafusion/pull/15301/commits/1f2fcd832ce7432e9ecb23d698ebf83823f48405#diff-cb5bce55559bce1aacd37171f2501f660cd564eb83dac28d331e39bef98aa227 which is a hybrid of explicitly passing around a `DynamicFilterSource` and an opaque `PhysicalExpr`: the idea is to explicitly pass around the dynamic filter source so that operators can opt-into it explicitly and do special handling such as taking a snapshot for serializing across the wire or `PruningPredicate`, but are still able to convert it to a `PhysicalExpr` when needed. It was necessary to take quite a bit of care with the `PhyscialExpr` wrapper implementation because e.g. filter pushdown remaps column indices by rewriting children so we need to do dynamic rewriting of children. But after it was all wired up I think it is pretty nice. - I switched the dynamic aspect from polling (you ask `DynamicFilterSource` for new filters) to a push model (when the TopK updates it pushes the new filters into the shared state). I think this should be more performant, and it completely removes locks on the TopK heap. This could be made even more efficient if we bring back the `supports_dynamic_filter_pushdown() -> bool` method since we can avoid doing some work and setting up references / locks if we can know ahead of time if anything will need it or not. I was not able to get the global TopK thing working because it's not actually a global TopK doing the work at the top level: it's a `SortPreservingMergeExec` which I think in theory we can implement this optimization for but this PR is large enough as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org