alamb commented on code in PR #14329:
URL: https://github.com/apache/datafusion/pull/14329#discussion_r1934505630


##########
datafusion/expr/src/logical_plan/extension.rs:
##########
@@ -54,6 +57,22 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync {
     /// Return the output schema of this logical plan node.
     fn schema(&self) -> &DFSchemaRef;
 
+    /// Return the list of invariants.
+    ///
+    /// Implementing this function enables the user to define the
+    /// invariants for a given logical plan extension.
+    fn invariants(&self) -> Vec<Invariant> {
+        vec![]
+    }
+
+    /// Perform check of invariants for the extension node.
+    fn check_invariants(&self, check: InvariantLevel, plan: &LogicalPlan) -> 
Result<()> {

Review Comment:
   I didn't understand the need to have both 
   
   ```rust
       fn invariants(&self) -> Vec<Invariant> {
   ```
   
   And 
   ```
       fn check_invariants(&self, check: InvariantLevel, plan: &LogicalPlan) -> 
Result<()> {
   ```
   
   If there is a reason to have both, can you please add to the documentation 
guidance for when the user would define one or the other?  
   
   If there is no reason for both, perhaps we can start with just adding 
`check_invariants` (and remove `invariants` from thi PR)  which is the more 
flexible API 



##########
datafusion/expr/src/logical_plan/invariants.rs:
##########
@@ -28,6 +30,24 @@ use crate::{
     Aggregate, Expr, Filter, Join, JoinType, LogicalPlan, Window,
 };
 
+use super::Extension;
+
+pub type InvariantFn = Arc<dyn Fn(&LogicalPlan) -> Result<()> + Send + Sync>;
+
+#[derive(Clone)]
+pub struct Invariant {
+    pub kind: InvariantLevel,
+    pub fun: InvariantFn,
+}
+
+impl Invariant {
+    /// Return an error if invariant does not hold true.
+    pub fn check(&self, plan: &LogicalPlan) -> Result<()> {
+        (self.fun)(plan)
+    }
+}

Review Comment:
   I suggest we defer user defined invariants on pre-existing LogicalPlan nodes 
until there is a clear usecase.
   
   I think the same thing can be achieved today with a custom OptimizerRule
   
   I am thinking it would be good to put the existing infrastructure out there 
and see how it works before we add more things to it



##########
datafusion/expr/src/logical_plan/invariants.rs:
##########
@@ -41,19 +61,54 @@ pub enum InvariantLevel {
     Executable,
 }
 
-pub fn assert_always_invariants(plan: &LogicalPlan) -> Result<()> {
+/// Apply the [`InvariantLevel::Always`] check at the current plan node only.

Review Comment:
   ```suggestion
   /// Apply the [`InvariantLevel::Always`] check at the current plan node only.
   ///
   /// This does not recurs to any child nodes
   ```



##########
datafusion/expr/src/logical_plan/invariants.rs:
##########
@@ -41,19 +61,54 @@ pub enum InvariantLevel {
     Executable,
 }
 
-pub fn assert_always_invariants(plan: &LogicalPlan) -> Result<()> {
+/// Apply the [`InvariantLevel::Always`] check at the current plan node only.
+pub fn assert_always_invariants_at_current_node(plan: &LogicalPlan) -> 
Result<()> {
     // Refer to 
<https://datafusion.apache.org/contributor-guide/specification/invariants.html#relation-name-tuples-in-logical-fields-and-logical-columns-are-unique>
     assert_unique_field_names(plan)?;
 
     Ok(())
 }
 
+/// Visit the plan nodes, and confirm the [`InvariantLevel::Executable`]
+/// as well as the less stringent [`InvariantLevel::Always`] checks.
 pub fn assert_executable_invariants(plan: &LogicalPlan) -> Result<()> {
-    assert_always_invariants(plan)?;
+    // Always invariants
+    assert_always_invariants_at_current_node(plan)?;
+    assert_valid_extension_nodes(plan, InvariantLevel::Always)?;
+
+    // Executable invariants
+    assert_valid_extension_nodes(plan, InvariantLevel::Executable)?;
     assert_valid_semantic_plan(plan)?;
     Ok(())
 }
 
+/// Asserts that the query plan, and subplan, extension nodes have valid 
invariants.
+///
+/// Refer to 
[`UserDefinedLogicalNode::check_invariants`](super::UserDefinedLogicalNode)
+/// for more details of user-provided extension node invariants.
+fn assert_valid_extension_nodes(plan: &LogicalPlan, check: InvariantLevel) -> 
Result<()> {

Review Comment:
   As written I think this this does a separate walk of the tree than 
`assert_subqueries_are_valid` 
   
   Maybe as a follow on PR we could unify the walks (so the tree gets walked 
once and all checks applied) rather than two separate plans



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to