peter-toth opened a new pull request, #10543: URL: https://github.com/apache/datafusion/pull/10543
## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10505, required for https://github.com/apache/datafusion/issues/10426. ## Rationale for this change The current `TreeNode` visitor APIs (`TreeNode::visit()` and `TreeNode::apply()`) have a limiation due to the lifetimes of the `TreeNode` references passed to `TreeNodeVisitor::f_down()`, `TreeNodeVisitor::f_up()` and the `f` closure of `apply()` don't match the lifetime of the root `TreeNode` reference on which the APIs are called. This restriction means that APIs cant build up a data structure that contains references to the descendant treenodes. E.g. the following code snippet to collect how many times subexpressions occur in an expression tree doesn't work: ``` let e = sum((col("a") * (lit(1) - col("b"))) * (lit(1) + col("c"))); let mut m: HashMap<&Expr, usize> = HashMap::new(); e.apply(|e| { *m.entry(e).or_insert(0) += 1; Ok(TreeNodeRecursion::Continue) }); println!("m: {:#?}", m); ``` This PR introduces new APIs (`TreeNode::apply_ref()` and `TreeNode::visit_ref()`) to `TreeNode`s where the lifetime of the references match the lifetime of the root `TreeNode` reference so the above example will work using `apply_ref()` instead of `apply()`. Please note: An alternative to adding the new APIs would be to change the current `TreeNode::apply()` and `TreeNode::visit()` APIs and make them stricter. This is a bit problematic due to: - `DynTreeNode`s implementations don't have any method to get references to a node's children. (I.e. we would need an `fn children(&self) -> Vec<&Arc<Self>>` instead of the current `fn arc_children(&self) -> Vec<Arc<Self>>`.) - The `LogicalPlan::apply_with_subqueries()` and `LogicalPlan::visit_with_subqueries()` APIs, that are similar to `TreeNode`'s base APIs but provide subquery support, can't be made stricter easily. This is because those 2 methods use the `LogicalPlan::apply_subqueries()` helper, whose implementation creates temporal `LogicalPlan::Subquery` objects that can't meet the lifetime requirements of the API. If we can't change `LogicalPlan::apply_with_subqueries()` and `LogicalPlan::visit_with_subqueries()` then probably it is better to keep `TreeNode::apply()` and `TreeNode::visit()` in sync with them, and introduce new `TreeNode` APIs with the stricter node reference lifetimes. ## What changes are included in this PR? - This PR introduces new `TreeNode::apply_ref()`, `TreeNode::visit_ref()` and `TreeNode::apply_children_ref()` APIs. - Because `TreeNode::apply_children_ref()` is stricter than current `TreeNode::apply_children()`, a default implementation of `TreeNode::apply_children()` is added to `TreeNode` and it points to `TreeNode::apply_children_ref()`. - Most of the current `TreeNode` implementations (`Expr`, `LogicalPlan`, `ConcreteTreeNode` are amended to implement the stricter `TreeNode::apply_children_ref()` instead of `TreeNode::apply_children()`. - As `DynTreeNode` and its implementation don't have a method to get references to a node's children, `DynTreeNode::apply_children()`'s implementation remains untouched in this PR, `DynTreeNode::apply_children_ref()` implementation panics and so the new APIs are not usable there. ## Are these changes tested? Yes, with new UTs. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org