wiedld commented on code in PR #13651:
URL: https://github.com/apache/datafusion/pull/13651#discussion_r1870608977


##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -451,6 +468,33 @@ impl Optimizer {
     }
 }
 
+/// These are invariants to hold true for each logical plan.
+/// Do necessary check and fail the invalid plan.
+///
+/// Checks for elements which are immutable across optimizer passes.
+fn check_plan(
+    check_name: &str,
+    plan: &LogicalPlan,
+    prev_schema: Arc<DFSchema>,
+) -> Result<()> {
+    // verify invariant: optimizer rule didn't change the schema
+    assert_schema_is_the_same(check_name, &prev_schema, plan)?;
+
+    // verify invariant: fields must have unique names
+    assert_unique_field_names(plan)?;
+
+    /* This current fails for:
+       - execution::context::tests::cross_catalog_access
+       - at test_files/string/string.slt:46
+               External error: query failed: DataFusion error: Optimizer rule 
'eliminate_nested_union' failed
+    */
+    // verify invariant: equivalent schema across union inputs
+    // assert_unions_are_valid(check_name, plan)?;
+
+    // TODO: trait API and provide extension on the Optimizer to define own 
validations?

Review Comment:
   Here is a mention of the extensibility of invariants. Options include:
   * for general invariants:
      * defined as being checked before/after each OptimizerRule, and applied 
here in `check_plan()` (or equivalent code)
      * we could provide `Optimizer.invariants = Vec<Arc<dyn InvariantCheck>>` 
for user-defined invariants
   * for invariants specific for a given OptimizerRule:
      * we could provide `OptimizerRule::check_invariants()` such that certain 
invariants are only checked for a given rule (instead of all rules)
      * for a user-defined OptimizerRule, users can also check their own 
invariants
   
   Ditto for the AnalyzerRule passes. Altho I wasn't sure about how much is 
added complexity and planning time overhead - as @Omega359 mentions we could 
[make it 
configurable](https://github.com/apache/datafusion/pull/13651#issuecomment-2519006923)
 (e.g. run for CI and debugging in downstream projects). 
   
   This WIP is about proposing different ideas of what we could do. 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to