findepi commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r1989510726
########## datafusion/optimizer/src/simplify_expressions/unwrap_cast.rs: ########## @@ -177,6 +192,45 @@ pub(super) fn is_cast_expr_and_support_unwrap_cast_in_comparison_for_inlist< true } +fn cast_literal_to_type_with_op( + lit_value: &ScalarValue, + target_type: &DataType, + op: Operator, +) -> Option<ScalarValue> { + macro_rules! cast_or_else_return_none { + ($value:ident, $ty:expr) => {{ + let opts = arrow::compute::CastOptions { + safe: false, + format_options: Default::default(), + }; + let array = ScalarValue::to_array($value).ok()?; + let casted = arrow::compute::cast_with_options(&array, &$ty, &opts).ok()?; + let scalar = ScalarValue::try_from_array(&casted, 0).ok()?; + Some(scalar) + }}; + } + + match (op, lit_value) { + ( + Operator::Eq | Operator::NotEq, + ScalarValue::Utf8(Some(_)) + | ScalarValue::Utf8View(Some(_)) + | ScalarValue::LargeUtf8(Some(_)), + ) => match target_type { + DataType::Int8 => cast_or_else_return_none!(lit_value, DataType::Int8), + DataType::Int16 => cast_or_else_return_none!(lit_value, DataType::Int16), + DataType::Int32 => cast_or_else_return_none!(lit_value, DataType::Int32), + DataType::Int64 => cast_or_else_return_none!(lit_value, DataType::Int64), + DataType::UInt8 => cast_or_else_return_none!(lit_value, DataType::UInt8), + DataType::UInt16 => cast_or_else_return_none!(lit_value, DataType::UInt16), + DataType::UInt32 => cast_or_else_return_none!(lit_value, DataType::UInt32), + DataType::UInt64 => cast_or_else_return_none!(lit_value, DataType::UInt64), + _ => None, + }, + _ => None, Review Comment: As it's written, and as the function is named, this looks like generic approach that will work many different source/target type pairs, we just need to add logic. However, the exact checks necessary for soundness will likely differ. For casts between exact numbers (ints and decimals) and string types, the https://github.com/apache/datafusion/pull/15110#issuecomment-2714474220 looks like a sound plan. Plus, we could also simplify impossible target literals like this - `cast(an_int as string) = 'abc'` or `cast(an_int as string) = '007'` to `if(an_int is null, null, false)` For casts between numbers and numbers, we likely need to recognize max addressible values, and cast impreciseness, especially when we want to handle `<`, `<=`, `>`, `>=` operators. https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L199-L354 might serve as a good reference. Note how it handles NaN, max values, lossy round-tripping, sometimes converting `<` to `<=` as necessary. For e.g cast from timestamp to date, the logic will likely be different again https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L357-L389. With all this in mind, I'd suggest structuring this code so it's clear it addresses only the exact number and string case, by checking expr and literal types and delegating to a function that has "int" and "utf8" or "string" in the name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org