alamb commented on code in PR #13651:
URL: https://github.com/apache/datafusion/pull/13651#discussion_r1894415438


##########
datafusion/expr/src/logical_plan/invariants.rs:
##########
@@ -281,13 +383,30 @@ fn check_mixed_out_refer_in_window(window: &Window) -> 
Result<()> {
     }
 }
 
+fn collect_subquery_cols(

Review Comment:
   this seems like a copy of the code in 
https://github.com/apache/datafusion/blob/344f0897d105cdd588d7a7ee8cef0646f844937a/datafusion/optimizer/src/utils.rs#L90-L89
   
   It would be nice if it were shared -- perhaps it could be moved in 
https://github.com/apache/datafusion/blob/2ac8af894f9aefb3215b74044b7f177b42f7e65f/datafusion/expr/src/utils.rs#L18-L17
 to be shared



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -445,35 +462,38 @@ impl Optimizer {
             }
             i += 1;
         }
+
+        // verify LP is valid, after the last optimizer pass.

Review Comment:
   💯  -- I think this is a very good thing to also check



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -384,9 +394,16 @@ impl Optimizer {
                     // rule handles recursion itself
                     None => optimize_plan_node(new_plan, rule.as_ref(), 
config),
                 }
-                // verify the rule didn't change the schema
                 .and_then(|tnr| {
-                    assert_schema_is_the_same(rule.name(), &starting_schema, 
&tnr.data)?;
+                    // verify after each optimizer pass.
+                    assert_valid_optimization(rule.name(), &tnr.data, 
&starting_schema)

Review Comment:
   To avoid performance regressions but still get the benefit of this check, I 
recommend changing this check to only run after each optimzer pass in debug 
mode (`#cfg(debug)`)



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -445,35 +462,38 @@ impl Optimizer {
             }
             i += 1;
         }
+
+        // verify LP is valid, after the last optimizer pass.
+        new_plan
+            .check_invariants(InvariantLevel::Executable)
+            .map_err(|e| {
+                DataFusionError::Context(
+                    "check_plan_after_optimizers".to_string(),
+                    Box::new(e),
+                )
+            })?;
+
         log_plan("Final optimized plan", &new_plan);
         debug!("Optimizer took {} ms", start_time.elapsed().as_millis());
         Ok(new_plan)
     }
 }
 
-/// Returns an error if `new_plan`'s schema is different than `prev_schema`
+/// These are invariants which should hold true before and after each 
optimization.
 ///
-/// It ignores metadata and nullability.
-pub(crate) fn assert_schema_is_the_same(
+/// This differs from [`LogicalPlan::check_invariants`], which addresses if a 
singular
+/// LogicalPlan is valid. Instead this address if the optimization (before and 
after)
+/// is valid based upon permitted changes.
+fn assert_valid_optimization(

Review Comment:
   Since all this does is call `assert_expected_schema `I think it might be 
easier to understand if you changed the callsite to call 
`assert_expected_schema` directly



##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -46,17 +42,24 @@ pub mod expand_wildcard_rule;
 pub mod function_rewrite;
 pub mod inline_table_scan;
 pub mod resolve_grouping_function;
-pub mod subquery;
 pub mod type_coercion;
 
+pub mod subquery {

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to