findepi commented on code in PR #17351: URL: https://github.com/apache/datafusion/pull/17351#discussion_r2324504677
########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -377,6 +377,19 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash { // static expressions will always return 0. 0 } + + /// Returns true if the expression node is volatile, i.e. whether it can return + /// different results when evaluated multiple times with the same input. + /// + /// Note: unlike [`is_volatile`], this function does not consider inputs: + /// - `random()` returns `true`, + /// - `a + random()` returns `false` + /// + /// By default, expressions are not volatile to avoid imposing API churn on implementers. + /// It is highly recommended that volatile expressions implement this method and return `true`. + fn is_volatile_node(&self) -> bool { Review Comment: > Also maybe `is_volatile` is enough `is_volatile` should return whether the "expression is volatile". it would be enough, but it needs to traverse the expr tree > I could see this being missed + difficult for reviewers to keep in mind when new physical expressions are added. The default impl is not a safe default. What happens when someone has existing volatile `PhysicalExpr` implementation and doesn't override this method? ########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -377,6 +377,19 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash { // static expressions will always return 0. 0 } + + /// Returns true if the expression node is volatile, i.e. whether it can return + /// different results when evaluated multiple times with the same input. + /// + /// Note: unlike [`is_volatile`], this function does not consider inputs: + /// - `random()` returns `true`, + /// - `a + random()` returns `false` + /// + /// By default, expressions are not volatile to avoid imposing API churn on implementers. + /// It is highly recommended that volatile expressions implement this method and return `true`. + fn is_volatile_node(&self) -> bool { + false Review Comment: This is likely not a safe default. What happens when someone has existing volatile PhysicalExpr implementation and doesn't override this method? Today the answer is "nothing", as these functions are not used for anything. But as soon as they are used for something (eg some optimization pass), this not safe a default may lead to correctness bugs (like constant-folding a rand call). ########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -560,3 +573,26 @@ pub fn is_dynamic_physical_expr(expr: &Arc<dyn PhysicalExpr>) -> bool { // If the generation is non-zero, then this `PhysicalExpr` is dynamic. snapshot_generation(expr) != 0 } + +/// Returns true if the expression is volatile, i.e. whether it can return different +/// results when evaluated multiple times with the same input. +/// +/// For example the function call `RANDOM()` is volatile as each call will +/// return a different value. +/// +/// This method recursively checks if any sub-expression is volatile, for example +/// `1 + RANDOM()` will return `true`. +pub fn is_volatile(expr: &Arc<dyn PhysicalExpr>) -> bool { + let mut is_volatile = expr.is_volatile_node(); + expr.apply(|e| { + is_volatile = is_volatile || e.is_volatile_node(); Review Comment: No need to call `expr.apply` when `is_volatile` is already true. Does it matter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org