alamb commented on code in PR #18789:
URL: https://github.com/apache/datafusion/pull/18789#discussion_r2561161409


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1956,6 +1962,36 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================

Review Comment:
   > Probably we can refactor all the operators like +, -, < to use ScalarUDF 
interface? Fundamentally they're the all a node in the expression tree.
   > The refactoring process can get a bit painful, but it's appealing to see a 
unified interface for all the expressions in the long term.
   
   Yes in the long term making all operators use ScalarUDF might make sense -- 
however it would be pretty painful  as `Operator` is passed around many 
different places -- we would have to extract that out however.
   
   
   I have a suggestion for a slightly different API following ClickHouse's 
model (`preimage`) that I think might make more sense for this particular case



##########
datafusion/sqllogictest/test_files/udf_preimage.slt:
##########
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (asF) under one

Review Comment:
   I haven't reviewed the test coverage yet, but can we please ensure we have 
added all the corresponding tests from duckdb (both initial PR and the fix):
   - https://github.com/duckdb/duckdb/pull/18457
   - https://github.com/duckdb/duckdb/pull/19628
   
   
   We can probably copy most of the tests from the following places (they use 
sqllogictests as well)
   
https://github.com/duckdb/duckdb/blob/main/test/optimizer/date_trunc_simplification_icu.test
   
https://github.com/duckdb/duckdb/blob/main/test/optimizer/date_trunc_simplification.test



##########
datafusion/optimizer/src/simplify_expressions/udf_preimage.rs:
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::str::FromStr;
+
+use arrow::compute::kernels::cast_utils::IntervalUnit;
+use datafusion_common::{internal_err, tree_node::Transformed, Result, 
ScalarValue};
+use datafusion_expr::{
+    and, expr::ScalarFunction, lit, or, simplify::SimplifyInfo, BinaryExpr, 
Expr,
+    Operator, ScalarUDFImpl,
+};
+use datafusion_functions::datetime::date_part::DatePartFunc;
+
+pub(super) fn preimage_in_comparison_for_binary(
+    info: &dyn SimplifyInfo,
+    udf_expr: Expr,
+    literal: Expr,
+    op: Operator,
+) -> Result<Transformed<Expr>> {
+    let (func, args, lit_value) = match (udf_expr, literal) {
+        (
+            Expr::ScalarFunction(ScalarFunction { func, args }),
+            Expr::Literal(lit_value, _),
+        ) => (func, args, lit_value),
+        _ => return internal_err!("Expect date_part expr and literal"),
+    };
+    let expr = Box::new(args[1].clone());

Review Comment:
   You can assume that by this time the arguments will have been properly 
coerced / and type checked



##########
datafusion/expr/src/udf.rs:
##########
@@ -696,6 +697,35 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + 
Sync {
         Ok(ExprSimplifyResult::Original(args))
     }
 
+    /// Attempts to convert a literal value to the corresponding datatype
+    /// of a column expression so that a **preimage** can be computed for
+    /// pruning comparison predicates.
+    ///
+    /// This is used during predicate-pushdown optimization
+    /// (see 
`datafusion-optimizer-udf_preimage::preimage_in_comparison_for_binary`)
+    ///
+    /// Currently is only implemented by:
+    /// - `date_part(YEAR, expr)`
+    ///
+    /// # Arguments:
+    /// * `lit_value`:  The literal `&ScalarValue` used in comparison
+    /// * `target_type`: The datatype of the column expression inside the 
function
+    /// * `op`: The comparison `Operator` (e.g. `=`, `<`, `>=`).
+    ///
+    /// # Returns
+    ///
+    /// Returns a `ScalarValue` converted to the appropriate target type if a
+    /// preimage cast is supported for the given function/operator combination;
+    /// otherwise returns `None`.
+    fn preimage_cast(

Review Comment:
   I dug around in ClickHouse and here is there corresponding API:
   
   
https://github.com/ClickHouse/ClickHouse/blob/9423252332fca10974a77c51273cf722f562bf4f/src/Functions/IFunction.h#L305-L308
    
   And here is the implementation for the date/time transforms
   
https://github.com/ClickHouse/ClickHouse/blob/9423252332fca10974a77c51273cf722f562bf4f/src/Functions/DateTimeTransforms.h#L1406
   
https://github.com/ClickHouse/ClickHouse/blob/9423252332fca10974a77c51273cf722f562bf4f/src/Functions/DateTimeTransforms.h#L2286
   
   And here is how it is used:
   
https://github.com/ClickHouse/ClickHouse/blob/9423252332fca10974a77c51273cf722f562bf4f/src/Analyzer/Passes/OptimizeDateOrDateTimeConverterWithPreimagePass.cpp#L126-L139
   
   
   So in other words, perhaps we could make this API return a 
[`Interval`](https://docs.rs/datafusion/latest/datafusion/logical_expr/interval_arithmetic/struct.Interval.html)
 rather than rewriting the entire function internally
   
   Something like this perhaps:
   
   ```rust
       fn preimage(&self, args: &[Expr]) -> Option<Interval> { 
         None 
       }
   ```
   
   
   



##########
datafusion/optimizer/src/simplify_expressions/udf_preimage.rs:
##########
@@ -0,0 +1,407 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::str::FromStr;
+
+use arrow::compute::kernels::cast_utils::IntervalUnit;
+use datafusion_common::{internal_err, tree_node::Transformed, Result, 
ScalarValue};
+use datafusion_expr::{
+    and, expr::ScalarFunction, lit, or, simplify::SimplifyInfo, BinaryExpr, 
Expr,
+    Operator, ScalarUDFImpl,
+};
+use datafusion_functions::datetime::date_part::DatePartFunc;
+
+pub(super) fn preimage_in_comparison_for_binary(
+    info: &dyn SimplifyInfo,
+    udf_expr: Expr,
+    literal: Expr,
+    op: Operator,
+) -> Result<Transformed<Expr>> {
+    let (func, args, lit_value) = match (udf_expr, literal) {
+        (
+            Expr::ScalarFunction(ScalarFunction { func, args }),
+            Expr::Literal(lit_value, _),
+        ) => (func, args, lit_value),
+        _ => return internal_err!("Expect date_part expr and literal"),
+    };
+    let expr = Box::new(args[1].clone());
+
+    let Ok(expr_type) = info.get_data_type(&expr) else {
+        return internal_err!("Can't get the data type of the expr {:?}", 
&expr);
+    };
+
+    let preimage_func = match func.name() {
+        "date_part" => DatePartFunc::new(),
+        _ => return internal_err!("Preimage is not supported for {:?}", 
func.name()),
+    };
+
+    let rewritten_expr = match op {
+        Operator::Lt | Operator::GtEq => {
+            let v = match preimage_func.preimage_cast(&lit_value, &expr_type, 
op) {
+                Some(v) => v,
+                None => {
+                    return internal_err!("Could not cast literal to the column 
type")
+                }
+            };

Review Comment:
   yeah, I think you could avoid doing so by following the pattern in duckdb or 
clickhouse
   
   I think it goes something like
   ```
   x = <expr>
   ```
   
   If you can get a 
[pre-image](https://en.wikipedia.org/wiki/Image_(mathematics)#Image_of_a_function)
 for `<expr>` (aka know the entire single range for which it is valid) then you 
can rewrite this to
   
   ```
   x >= preimage_min && x <= preimage_max
   ```
   
   And similarly for other operators 🤔 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to