findepi commented on code in PR #17521: URL: https://github.com/apache/datafusion/pull/17521#discussion_r2341148014
########## datafusion/sqllogictest/test_files/push_down_filter.slt: ########## @@ -395,3 +395,28 @@ order by t1.k, t2.v; ---- 1 1 1 10000000 10000000 10000000 + +# Regression test for https://github.com/apache/datafusion/issues/17512 + +query I +COPY ( + SELECT arrow_cast('2025-01-01T00:00:00Z'::timestamptz, 'Timestamp(Microsecond, Some("UTC"))') AS start_timestamp +) +TO 'test_files/scratch/push_down_filter/17512.parquet' +STORED AS PARQUET; +---- +1 + +statement ok +CREATE EXTERNAL TABLE records STORED AS PARQUET LOCATION 'test_files/scratch/push_down_filter/17512.parquet'; + +query I +SELECT 1 +FROM ( + SELECT start_timestamp + FROM records + WHERE start_timestamp <= '2025-01-01T00:00:00Z'::timestamptz +) AS t +WHERE t.start_timestamp::time < '00:00:01'::time; Review Comment: It looks like one predicate is on `start_timestamp` and the other is on `start_timestamp::time`. from predicate pushdown perspective, the latter is useless. from `find_most_restrictive_predicate` perspective, a predicate `c < x1` and `f(c) < x2` are incomparable. They need to take the `f` into account. The optimizer that does that is called "unwrap cast in comparison" AFAICT. The `find_most_restrictive_predicate` should operate only on predicates comparing column `c` directly, ignoring those which compare `f(c)`. ########## datafusion/optimizer/src/simplify_expressions/simplify_predicates.rs: ########## @@ -204,7 +204,13 @@ fn find_most_restrictive_predicate( if let Some(scalar) = scalar_value { if let Some(current_best) = best_value { - let comparison = scalar.try_cmp(current_best)?; + // Only compare if the scalar values have compatible types + // If they don't have compatible types, we can't determine which is more restrictive + let Ok(comparison) = scalar.try_cmp(current_best) else { + // Can't compare - types are incompatible, so we can't simplify Review Comment: Are `predicates: &[Expr]` supposed to be predicates on one column, or arbitrary predicates in the query? In the first case, they must be same type, so comparable and error should be propagated. If they happen to be incomparable, maybe the plan got degenerated (eg sides of a comparison have different types). If the predicates are not on one column, the comparison is meaningless and should not be attempted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org