adriangb commented on code in PR #20061:
URL: https://github.com/apache/datafusion/pull/20061#discussion_r2747636764


##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1968,9 +2115,16 @@ fn extract_dml_filters(input: &Arc<LogicalPlan>) -> 
Result<Vec<Expr>> {
     let mut filters = Vec::new();
 
     input.apply(|node| {
-        if let LogicalPlan::Filter(filter) = node {
-            // Split AND predicates into individual expressions
-            
filters.extend(split_conjunction(&filter.predicate).into_iter().cloned());
+        match node {
+            LogicalPlan::Filter(filter) => {
+                // Split AND predicates into individual expressions
+                
filters.extend(split_conjunction(&filter.predicate).into_iter().cloned());
+            }
+            LogicalPlan::TableScan(scan) => {
+                // Also extract filters from TableScan (where they may be 
pushed down)
+                filters.extend(scan.filters.iter().cloned());

Review Comment:
   Hmm interesting. Is the problem the stripping of qualifiers? It seems to me 
that the current implementation is generally fragile: we should be taking into 
account the join aspect and only collect from subtrees of the join. In 
particular:
   
   ```
   > explain format indent UPDATE "trg" SET col = 1 FROM src WHERE trg.id = 
src.id AND src.type = 'active' AND trg.id > 100;
   
+---------------+-----------------------------------------------------------------------+
   | plan_type     | plan                                                       
           |
   
+---------------+-----------------------------------------------------------------------+
   | logical_plan  | Dml: op=[Update] table=[trg]                               
           |
   |               |   Projection: trg.id AS id, Utf8View("1") AS col           
           |
   |               |     Inner Join: trg.id = src.id                            
           |
   |               |       Filter: trg.id > Int32(100)                          
           |
   |               |         TableScan: trg projection=[id]                     
           |
   |               |       Projection: src.id                                   
           |
   |               |         Filter: src.type = Utf8View("active") AND src.id > 
Int32(100) |
   |               |           TableScan: src projection=[id, type]             
           |
   ```
   
   It seems to me that we should do something like:
   1. Find the table scan corresponding to the table being updated.
   2. Walk up the tree until we hit a join / subquery (?) / other blocker.
   3. Collect all filters in that subtree (ideally once this PR is across the 
line they've all been pushed into the `TableScan` so that becomes trivial)
   
   I don't know how the DML stuff is supposed to handle more complex cases 
involving `EXISTS`, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to