sdf-jkl commented on code in PR #18648:
URL: https://github.com/apache/datafusion/pull/18648#discussion_r2849762483


##########
datafusion/sqllogictest/test_files/datetime/timestamps.slt:
##########
@@ -5358,3 +5358,301 @@ SELECT to_timestamp(arrow_cast(100.5, 'Float16'), name) 
FROM test_to_timestamp_s
 
 statement ok
 drop table test_to_timestamp_scalar
+
+## date_trunc Preimage tests
+
+# Test with timestamp data
+statement ok
+CREATE TABLE t1(ts TIMESTAMP) AS VALUES
+  (NULL),
+  ('2023-01-15T10:30:45'::timestamp),
+  ('2024-06-20T14:25:30'::timestamp),
+  ('2024-12-31T23:59:59'::timestamp),
+  ('2025-03-10T08:15:20'::timestamp);
+
+# Test YEAR granularity - basic comparisons
+
+query P
+SELECT ts FROM t1 WHERE date_trunc('year', ts) = timestamp 
'2024-01-01T00:00:00' ORDER BY ts;

Review Comment:
   Does your comment about this still hold?
   ```sql
   SELECT  PULocationID
              ,pickup_datetime
         FROM taxi_view_2025
        WHERE date_trunc('month', pickup_datetime) = '2025-12-03'
   ```
   
   Without `preimage` it would always return `False`, but with `preimage`, we 
create an interval `[2025-12-01, 2026-01-01)` and simplification rule returns 
`col >= 2025-12-01 and col < 2026-01-01` we could get false positives, because 
`2025-12-03` falls into that interval.



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########
@@ -412,6 +415,151 @@ impl ScalarUDFImpl for DateTruncFunc {
         })
     }
 
+    fn preimage(
+        &self,
+        args: &[Expr],
+        lit_expr: &Expr,
+        _info: &SimplifyContext,
+    ) -> Result<PreimageResult> {
+        // Determine what datetime granularity to use for preimage calculation
+        let [trunc_part, col_expr] = take_function_args(self.name(), args)?;
+        let granular_part = trunc_part
+            .as_literal()
+            .and_then(|sv| sv.try_as_str().flatten())
+            .map(part_normalization);
+
+        let granularity = match granular_part {
+            Some(trunc_granularity) => {
+                match DateTruncGranularity::from_str(trunc_granularity) {
+                    Ok(granularity_instance) => granularity_instance,
+                    Err(granularity_err) => {
+                        return Err(granularity_err);
+                    }
+                }
+            }
+            None => {
+                return Ok(PreimageResult::None);
+            }
+        };
+
+        fn trunc_interval_for_ts<TsType: ArrowTimestampType>(
+            ts_val: &i64,
+            ts_tz: &Option<Arc<str>>,
+            ts_granularity: DateTruncGranularity,
+        ) -> Result<Interval> {
+            let parsed_tz = parse_tz(ts_tz)?;
+            let is_calendar_granularity = matches!(
+                ts_granularity,
+                DateTruncGranularity::Year
+                    | DateTruncGranularity::Month
+                    | DateTruncGranularity::Quarter
+                    | DateTruncGranularity::Week
+                    | DateTruncGranularity::Day
+            );
+
+            // general_date_trunc converts to nanoseconds
+            let lower_val =
+                general_date_trunc(TsType::UNIT, *ts_val, parsed_tz, 
ts_granularity)?;
+            let upper_val = if is_calendar_granularity {
+                increment_timestamp_nanos_calendar(lower_val, parsed_tz, 
ts_granularity)?
+            } else {
+                increment_time_nanos(lower_val, ts_granularity)
+            };

Review Comment:
   `general_date_trunc` converts the values back to the original `TimeStamp` 
type, we shouldn't increment in nanos, but use the original `TimeUnit`.



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########
@@ -412,6 +415,151 @@ impl ScalarUDFImpl for DateTruncFunc {
         })
     }
 
+    fn preimage(
+        &self,
+        args: &[Expr],
+        lit_expr: &Expr,
+        _info: &SimplifyContext,
+    ) -> Result<PreimageResult> {
+        // Determine what datetime granularity to use for preimage calculation
+        let [trunc_part, col_expr] = take_function_args(self.name(), args)?;
+        let granular_part = trunc_part
+            .as_literal()
+            .and_then(|sv| sv.try_as_str().flatten())
+            .map(part_normalization);
+
+        let granularity = match granular_part {
+            Some(trunc_granularity) => {
+                match DateTruncGranularity::from_str(trunc_granularity) {
+                    Ok(granularity_instance) => granularity_instance,
+                    Err(granularity_err) => {
+                        return Err(granularity_err);
+                    }
+                }
+            }
+            None => {
+                return Ok(PreimageResult::None);
+            }
+        };
+
+        fn trunc_interval_for_ts<TsType: ArrowTimestampType>(

Review Comment:
   Could this move out of the `preimage` impl?



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########


Review Comment:
   We should add a guard for `Time` types granularities.
   We could reuse this function.



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########
@@ -502,6 +650,125 @@ fn truncate_time_secs(value: i32, granularity: 
DateTruncGranularity) -> i32 {
     }
 }
 
+/// Increment time in nanoseconds by the specified granularity
+fn increment_time_nanos(value: i64, granularity: DateTruncGranularity) -> i64 {
+    match granularity {
+        DateTruncGranularity::Hour => value + NANOS_PER_HOUR,
+        DateTruncGranularity::Minute => value + NANOS_PER_MINUTE,
+        DateTruncGranularity::Second => value + NANOS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + NANOS_PER_MILLISECOND,
+        DateTruncGranularity::Microsecond => value + NANOS_PER_MICROSECOND,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,
+    }
+}
+
+/// Increment time in microseconds by the specified granularity
+fn increment_time_micros(value: i64, granularity: DateTruncGranularity) -> i64 
{
+    match granularity {
+        DateTruncGranularity::Hour => value + MICROS_PER_HOUR,
+        DateTruncGranularity::Minute => value + MICROS_PER_MINUTE,
+        DateTruncGranularity::Second => value + MICROS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + MICROS_PER_MILLISECOND,
+        DateTruncGranularity::Microsecond => value + 1,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,
+    }
+}
+
+/// Increment time in milliseconds by the specified granularity
+fn increment_time_millis(value: i32, granularity: DateTruncGranularity) -> i32 
{
+    match granularity {
+        DateTruncGranularity::Hour => value + MILLIS_PER_HOUR,
+        DateTruncGranularity::Minute => value + MILLIS_PER_MINUTE,
+        DateTruncGranularity::Second => value + MILLIS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + 1,

Review Comment:
   Keep match arms for granularities finer than rhs?
   ```suggestion
           DateTruncGranularity::Millisecond => value + 1,
           DateTruncGranularity::Microsecond => value + 1,
   ```



##########
datafusion/sqllogictest/test_files/datetime/timestamps.slt:
##########
@@ -5358,3 +5358,301 @@ SELECT to_timestamp(arrow_cast(100.5, 'Float16'), name) 
FROM test_to_timestamp_s
 
 statement ok
 drop table test_to_timestamp_scalar
+
+## date_trunc Preimage tests
+
+# Test with timestamp data
+statement ok
+CREATE TABLE t1(ts TIMESTAMP) AS VALUES
+  (NULL),
+  ('2023-01-15T10:30:45'::timestamp),
+  ('2024-06-20T14:25:30'::timestamp),
+  ('2024-12-31T23:59:59'::timestamp),
+  ('2025-03-10T08:15:20'::timestamp);
+
+# Test YEAR granularity - basic comparisons
+
+query P
+SELECT ts FROM t1 WHERE date_trunc('year', ts) = timestamp 
'2024-01-01T00:00:00' ORDER BY ts;

Review Comment:
   It was actually covered for `floor preimage` impl by @devanshu0987 in #20059
   Check here:
   
https://github.com/apache/datafusion/pull/20059/changes#diff-077176fcf22cb36a0a51631a43739f5f015f46305be4f49142a450e25b152b84R280-R303
   `Floor` is very similar to `date_trunc`, so we could replicate the behavior.



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########
@@ -502,6 +650,125 @@ fn truncate_time_secs(value: i32, granularity: 
DateTruncGranularity) -> i32 {
     }
 }
 
+/// Increment time in nanoseconds by the specified granularity
+fn increment_time_nanos(value: i64, granularity: DateTruncGranularity) -> i64 {
+    match granularity {
+        DateTruncGranularity::Hour => value + NANOS_PER_HOUR,
+        DateTruncGranularity::Minute => value + NANOS_PER_MINUTE,
+        DateTruncGranularity::Second => value + NANOS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + NANOS_PER_MILLISECOND,
+        DateTruncGranularity::Microsecond => value + NANOS_PER_MICROSECOND,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,
+    }
+}
+
+/// Increment time in microseconds by the specified granularity
+fn increment_time_micros(value: i64, granularity: DateTruncGranularity) -> i64 
{
+    match granularity {
+        DateTruncGranularity::Hour => value + MICROS_PER_HOUR,
+        DateTruncGranularity::Minute => value + MICROS_PER_MINUTE,
+        DateTruncGranularity::Second => value + MICROS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + MICROS_PER_MILLISECOND,
+        DateTruncGranularity::Microsecond => value + 1,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,
+    }
+}
+
+/// Increment time in milliseconds by the specified granularity
+fn increment_time_millis(value: i32, granularity: DateTruncGranularity) -> i32 
{
+    match granularity {
+        DateTruncGranularity::Hour => value + MILLIS_PER_HOUR,
+        DateTruncGranularity::Minute => value + MILLIS_PER_MINUTE,
+        DateTruncGranularity::Second => value + MILLIS_PER_SECOND,
+        DateTruncGranularity::Millisecond => value + 1,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,
+    }
+}
+
+/// Increment time in seconds by the specified granularity
+fn increment_time_secs(value: i32, granularity: DateTruncGranularity) -> i32 {
+    match granularity {
+        DateTruncGranularity::Hour => value + SECS_PER_HOUR,
+        DateTruncGranularity::Minute => value + SECS_PER_MINUTE,
+        DateTruncGranularity::Second => value + 1,
+        // Other granularities are not valid for time - should be caught 
earlier
+        _ => value,

Review Comment:
   return `Err` or `None`?



##########
datafusion/functions/src/datetime/date_trunc.rs:
##########
@@ -412,6 +415,151 @@ impl ScalarUDFImpl for DateTruncFunc {
         })
     }
 
+    fn preimage(
+        &self,
+        args: &[Expr],
+        lit_expr: &Expr,
+        _info: &SimplifyContext,
+    ) -> Result<PreimageResult> {
+        // Determine what datetime granularity to use for preimage calculation
+        let [trunc_part, col_expr] = take_function_args(self.name(), args)?;
+        let granular_part = trunc_part
+            .as_literal()
+            .and_then(|sv| sv.try_as_str().flatten())
+            .map(part_normalization);
+

Review Comment:
   We should add a guard for Type families. `col_expr: TimeStamp` needs a 
`lit_expr: TimeStamp` and the same for `Time` types.



##########
datafusion/sqllogictest/test_files/datetime/timestamps.slt:
##########
@@ -5358,3 +5358,301 @@ SELECT to_timestamp(arrow_cast(100.5, 'Float16'), name) 
FROM test_to_timestamp_s
 
 statement ok
 drop table test_to_timestamp_scalar
+
+## date_trunc Preimage tests
+
+# Test with timestamp data
+statement ok
+CREATE TABLE t1(ts TIMESTAMP) AS VALUES
+  (NULL),
+  ('2023-01-15T10:30:45'::timestamp),
+  ('2024-06-20T14:25:30'::timestamp),
+  ('2024-12-31T23:59:59'::timestamp),
+  ('2025-03-10T08:15:20'::timestamp);
+
+# Test YEAR granularity - basic comparisons
+
+query P
+SELECT ts FROM t1 WHERE date_trunc('year', ts) = timestamp 
'2024-01-01T00:00:00' ORDER BY ts;

Review Comment:
   We'd need to change the behavior to cover this.
   
   - One way would be by having a guard checking the `eq Operator` specifically 
for `date_trunc preimage` and return `PreimageResult::None` if `rhs != 
date_trunc(granularity, rhs)`. But `preimage` is `Operator` agnostic.
   - Another way is by having an optimization rule to do check this `rhs != 
date_trunc(granularity, rhs)` and return `False` for the whole column. But 
that's adding a rule just for one udf.
   - Another way is to only let `date_trunc preimage` work with with `rhs = 
date_trunc(granularity, rhs)`, but this requires the user to write the date in 
the right way if they want the query to run faster.
   For example:
   ```diff
   ++WHERE date_trunc('month', pickup_datetime) = '2025-12-01'
   --WHERE date_trunc('month', pickup_datetime) = '2025-12-03'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to