alamb commented on code in PR #13651:
URL: https://github.com/apache/datafusion/pull/13651#discussion_r1887126223


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1125,6 +1126,14 @@ impl LogicalPlan {
         }
     }
 
+    /// These are invariants to hold true for each logical plan.

Review Comment:
   See suggestion below



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1125,6 +1126,14 @@ impl LogicalPlan {
         }
     }
 
+    /// These are invariants to hold true for each logical plan.
+    pub fn assert_invariants(&self) -> Result<()> {
+        // Refer to 
<https://datafusion.apache.org/contributor-guide/specification/invariants.html#relation-name-tuples-in-logical-fields-and-logical-columns-are-unique>
+        assert_unique_field_names(self)?;

Review Comment:
   I had forgotten about that documentation 🤔 
   
   Maybe it would be a good thing to inline that information into the code 
rather than in a separate document that might be harder to find (not for you, I 
am just musing here for my own benefit)



##########
datafusion/expr/src/logical_plan/mod.rs:
##########
@@ -20,6 +20,8 @@ mod ddl;
 pub mod display;
 pub mod dml;
 mod extension;
+pub(crate) mod invariants;

Review Comment:
   I didn't see a new `invariants` module in this PR 🤔 



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1125,6 +1126,14 @@ impl LogicalPlan {
         }
     }
 
+    /// These are invariants to hold true for each logical plan.
+    pub fn assert_invariants(&self) -> Result<()> {
+        // Refer to 
<https://datafusion.apache.org/contributor-guide/specification/invariants.html#relation-name-tuples-in-logical-fields-and-logical-columns-are-unique>
+        assert_unique_field_names(self)?;
+

Review Comment:
   Good call -- I think we could/should discover/enforce new invariants as 
follow on PRs after we have sorted out this basic API



##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -164,18 +172,38 @@ impl Analyzer {
             log_plan(rule.name(), &new_plan);
             observer(&new_plan, rule.as_ref());
         }
-        // for easier display in explain output
-        check_plan(&new_plan).map_err(|e| {
+
+        // verify at the end, after the last LP analyzer pass.
+        assert_valid_semantic_plan(&new_plan).map_err(|e| {
             DataFusionError::Context("check_analyzed_plan".to_string(), 
Box::new(e))
         })?;
+
         log_plan("Final analyzed plan", &new_plan);
         debug!("Analyzer took {} ms", start_time.elapsed().as_millis());
         Ok(new_plan)
     }
 }
 
-/// Do necessary check and fail the invalid plan
-fn check_plan(plan: &LogicalPlan) -> Result<()> {
+/// These are invariants which should hold true before and after each analyzer.
+///
+/// This differs from [`LogicalPlan::assert_invariants`], which addresses if a 
singular
+/// LogicalPlan is valid. Instead this address if the analyzer (before and 
after)
+/// is valid based upon permitted changes.
+///
+/// Does not check elements which are mutated by the analyzers (e.g. the 
schema).
+fn assert_valid_semantic_plan(plan: &LogicalPlan) -> Result<()> {
+    plan.assert_invariants()?;
+
+    // TODO: should this be moved to LogicalPlan::assert_invariants?
+    assert_subqueries_are_valid(plan)?;

Review Comment:
   > If I run these checks before the first analyzer pass => we get test 
failures.
   
   I think this is somewhat "expected" -- part of the Analyzer's job is to make 
LogicalPlan's more "valid" as described here 
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html
   
   Perhaps we can make this part of `LogicalPlan::assert_invariants`, with an 
enum that explains the rules being followed? 
   
   ```rust
   enum InvariantLevel {
     /// Invariants that are always true in DataFusion `LogicalPlan`s
     /// such as the number of expected children and no duplicated output fields
     Always, 
     /// Invariants that must hold true for the plan to be "executable"
     /// such as the type and number of function arguments are correct and
     /// that wildcards have been expanded
     ///
     /// To ensure a LogicalPlan satisfies the `Executable` inariants, run the
     /// `Analyzer`
     Executable,
   }
   
   impl LogicalPlan {
     /// checks that the plan conforms to the listed invariant level, returning 
an Error if not
     fn check_invariants(&self, check: InvariantLevel) -> Result<()> {
     ...
     }
   }
   ```
   
   That way the `assert_subqueries_are_valid` check would always be available
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to