alamb commented on code in PR #13651:
URL: https://github.com/apache/datafusion/pull/13651#discussion_r1894415438
##########
datafusion/expr/src/logical_plan/invariants.rs:
##########
@@ -281,13 +383,30 @@ fn check_mixed_out_refer_in_window(window: &Window) ->
Result<()> {
}
}
+fn collect_subquery_cols(
Review Comment:
this seems like a copy of the code in
https://github.com/apache/datafusion/blob/344f0897d105cdd588d7a7ee8cef0646f844937a/datafusion/optimizer/src/utils.rs#L90-L89
It would be nice if it were shared -- perhaps it could be moved in
https://github.com/apache/datafusion/blob/2ac8af894f9aefb3215b74044b7f177b42f7e65f/datafusion/expr/src/utils.rs#L18-L17
to be shared
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -445,35 +462,38 @@ impl Optimizer {
}
i += 1;
}
+
+ // verify LP is valid, after the last optimizer pass.
Review Comment:
💯 -- I think this is a very good thing to also check
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -384,9 +394,16 @@ impl Optimizer {
// rule handles recursion itself
None => optimize_plan_node(new_plan, rule.as_ref(),
config),
}
- // verify the rule didn't change the schema
.and_then(|tnr| {
- assert_schema_is_the_same(rule.name(), &starting_schema,
&tnr.data)?;
+ // verify after each optimizer pass.
+ assert_valid_optimization(rule.name(), &tnr.data,
&starting_schema)
Review Comment:
To avoid performance regressions but still get the benefit of this check, I
recommend changing this check to only run after each optimzer pass in debug
mode (`#cfg(debug)`)
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -445,35 +462,38 @@ impl Optimizer {
}
i += 1;
}
+
+ // verify LP is valid, after the last optimizer pass.
+ new_plan
+ .check_invariants(InvariantLevel::Executable)
+ .map_err(|e| {
+ DataFusionError::Context(
+ "check_plan_after_optimizers".to_string(),
+ Box::new(e),
+ )
+ })?;
+
log_plan("Final optimized plan", &new_plan);
debug!("Optimizer took {} ms", start_time.elapsed().as_millis());
Ok(new_plan)
}
}
-/// Returns an error if `new_plan`'s schema is different than `prev_schema`
+/// These are invariants which should hold true before and after each
optimization.
///
-/// It ignores metadata and nullability.
-pub(crate) fn assert_schema_is_the_same(
+/// This differs from [`LogicalPlan::check_invariants`], which addresses if a
singular
+/// LogicalPlan is valid. Instead this address if the optimization (before and
after)
+/// is valid based upon permitted changes.
+fn assert_valid_optimization(
Review Comment:
Since all this does is call `assert_expected_schema `I think it might be
easier to understand if you changed the callsite to call
`assert_expected_schema` directly
##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -46,17 +42,24 @@ pub mod expand_wildcard_rule;
pub mod function_rewrite;
pub mod inline_table_scan;
pub mod resolve_grouping_function;
-pub mod subquery;
pub mod type_coercion;
+pub mod subquery {
Review Comment:
👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]