alamb commented on code in PR #13651: URL: https://github.com/apache/datafusion/pull/13651#discussion_r1894415438
########## datafusion/expr/src/logical_plan/invariants.rs: ########## @@ -281,13 +383,30 @@ fn check_mixed_out_refer_in_window(window: &Window) -> Result<()> { } } +fn collect_subquery_cols( Review Comment: this seems like a copy of the code in https://github.com/apache/datafusion/blob/344f0897d105cdd588d7a7ee8cef0646f844937a/datafusion/optimizer/src/utils.rs#L90-L89 It would be nice if it were shared -- perhaps it could be moved in https://github.com/apache/datafusion/blob/2ac8af894f9aefb3215b74044b7f177b42f7e65f/datafusion/expr/src/utils.rs#L18-L17 to be shared ########## datafusion/optimizer/src/optimizer.rs: ########## @@ -445,35 +462,38 @@ impl Optimizer { } i += 1; } + + // verify LP is valid, after the last optimizer pass. Review Comment: 💯 -- I think this is a very good thing to also check ########## datafusion/optimizer/src/optimizer.rs: ########## @@ -384,9 +394,16 @@ impl Optimizer { // rule handles recursion itself None => optimize_plan_node(new_plan, rule.as_ref(), config), } - // verify the rule didn't change the schema .and_then(|tnr| { - assert_schema_is_the_same(rule.name(), &starting_schema, &tnr.data)?; + // verify after each optimizer pass. + assert_valid_optimization(rule.name(), &tnr.data, &starting_schema) Review Comment: To avoid performance regressions but still get the benefit of this check, I recommend changing this check to only run after each optimzer pass in debug mode (`#cfg(debug)`) ########## datafusion/optimizer/src/optimizer.rs: ########## @@ -445,35 +462,38 @@ impl Optimizer { } i += 1; } + + // verify LP is valid, after the last optimizer pass. + new_plan + .check_invariants(InvariantLevel::Executable) + .map_err(|e| { + DataFusionError::Context( + "check_plan_after_optimizers".to_string(), + Box::new(e), + ) + })?; + log_plan("Final optimized plan", &new_plan); debug!("Optimizer took {} ms", start_time.elapsed().as_millis()); Ok(new_plan) } } -/// Returns an error if `new_plan`'s schema is different than `prev_schema` +/// These are invariants which should hold true before and after each optimization. /// -/// It ignores metadata and nullability. -pub(crate) fn assert_schema_is_the_same( +/// This differs from [`LogicalPlan::check_invariants`], which addresses if a singular +/// LogicalPlan is valid. Instead this address if the optimization (before and after) +/// is valid based upon permitted changes. +fn assert_valid_optimization( Review Comment: Since all this does is call `assert_expected_schema `I think it might be easier to understand if you changed the callsite to call `assert_expected_schema` directly ########## datafusion/optimizer/src/analyzer/mod.rs: ########## @@ -46,17 +42,24 @@ pub mod expand_wildcard_rule; pub mod function_rewrite; pub mod inline_table_scan; pub mod resolve_grouping_function; -pub mod subquery; pub mod type_coercion; +pub mod subquery { Review Comment: 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org