peter-toth opened a new pull request, #10543:
URL: https://github.com/apache/datafusion/pull/10543

   ## Which issue does this PR close?
   
   Part of  https://github.com/apache/datafusion/issues/10505, required for 
https://github.com/apache/datafusion/issues/10426.
   
   ## Rationale for this change
   
   The current `TreeNode` visitor APIs (`TreeNode::visit()` and 
`TreeNode::apply()`) have a limiation due to the lifetimes of the `TreeNode` 
references passed to `TreeNodeVisitor::f_down()`, `TreeNodeVisitor::f_up()` and 
the `f` closure of `apply()` don't match the lifetime of the root `TreeNode` 
reference on which the APIs are called.
   This restriction means that APIs cant build up a data structure that 
contains references to the descendant treenodes.
   
   E.g. the following code snippet to collect how many times subexpressions 
occur in an expression tree doesn't work:
   ```
   let e = sum((col("a") * (lit(1) - col("b"))) * (lit(1) + col("c")));
   let mut m: HashMap<&Expr, usize> = HashMap::new();
   e.apply(|e| {
       *m.entry(e).or_insert(0) += 1;
       Ok(TreeNodeRecursion::Continue)
   });
   
   println!("m: {:#?}", m);
   ```
   
   This PR introduces new APIs (`TreeNode::apply_ref()` and 
`TreeNode::visit_ref()`) to `TreeNode`s where the lifetime of the references 
match the lifetime of the root `TreeNode` reference so the above example will 
work using `apply_ref()` instead of `apply()`.
   
   Please note:
   An alternative to adding the new APIs would be to change the current 
`TreeNode::apply()` and `TreeNode::visit()` APIs and make them stricter. This 
is a bit problematic due to:
   - `DynTreeNode`s implementations don't have any method to get references to 
a node's children.
     (I.e. we would need an `fn children(&self) -> Vec<&Arc<Self>>` instead of 
the current `fn arc_children(&self) -> Vec<Arc<Self>>`.)
   - The `LogicalPlan::apply_with_subqueries()` and 
`LogicalPlan::visit_with_subqueries()` APIs, that are similar to `TreeNode`'s 
base APIs but provide subquery support, can't be made stricter easily. This is 
because those 2 methods use the `LogicalPlan::apply_subqueries()` helper, whose 
implementation creates temporal `LogicalPlan::Subquery` objects that can't meet 
the lifetime requirements of the API.
     If we can't change `LogicalPlan::apply_with_subqueries()` and 
`LogicalPlan::visit_with_subqueries()` then probably it is better to keep 
`TreeNode::apply()` and `TreeNode::visit()` in sync with them, and introduce 
new `TreeNode` APIs with the stricter node reference lifetimes.
   
   ## What changes are included in this PR?
   
   - This PR introduces new `TreeNode::apply_ref()`, `TreeNode::visit_ref()` 
and `TreeNode::apply_children_ref()` APIs.
   - Because `TreeNode::apply_children_ref()` is stricter than current 
`TreeNode::apply_children()`, a default implementation of 
`TreeNode::apply_children()` is added to `TreeNode` and it points to 
`TreeNode::apply_children_ref()`.
   - Most of the current `TreeNode` implementations (`Expr`, `LogicalPlan`, 
`ConcreteTreeNode` are amended to implement the stricter 
`TreeNode::apply_children_ref()` instead of `TreeNode::apply_children()`.
   - As `DynTreeNode` and its implementation don't have a method to get 
references to a node's children, `DynTreeNode::apply_children()`'s 
implementation remains untouched in this PR, 
`DynTreeNode::apply_children_ref()` implementation panics and so the new APIs 
are not usable there.
   
   ## Are these changes tested?
   
   Yes, with new UTs.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to