alamb commented on code in PR #19733:
URL: https://github.com/apache/datafusion/pull/19733#discussion_r2738697071


##########
datafusion/sqllogictest/test_files/udf_preimage.slt:
##########
@@ -0,0 +1,545 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   I recommend adding this date_part.slt to keep the code for date_part together



##########
datafusion/functions/src/datetime/date_part.rs:
##########
@@ -251,6 +321,56 @@ fn is_epoch(part: &str) -> bool {
     matches!(part.to_lowercase().as_str(), "epoch")
 }
 
+fn date_to_scalar(date: NaiveDate, target_type: &DataType) -> 
Option<ScalarValue> {
+    let days = date
+        .signed_duration_since(NaiveDate::from_epoch_days(0)?)
+        .num_days();
+
+    Some(match target_type {
+        Date32 => ScalarValue::Date32(Some(days as i32)),
+        Date64 => ScalarValue::Date64(Some(days * MILLISECONDS_IN_DAY)),
+
+        Timestamp(unit, tz_opt) => {

Review Comment:
   It feels to me like this code should be able to re-use the conversion 
functions in arrow rather than re-implementing them here
   
   For example
   ```rust
           Date32 => 
ScalarValue::Date32(Some(Date32Type::from_naive_date(date))),
           Date64 => 
ScalarValue::Date64(Some(Date64Type::from_naive_date(date))),
   ...
   ```
   
   I didn't have a chance to figure out how to do it for Timestamp, but it 
seems like there should be a function like this for  the timestamps too -- for 
example 
   
   
https://docs.rs/arrow/latest/arrow/array/types/struct.TimestampSecondType.html 
and https://docs.rs/arrow/latest/arrow/datatypes/trait.ArrowTimestampType.html
   
   Maybe something like 
   ```rust
   TimestampSecondType::make_value(date)
   ```
   
   (We will have to figure out timestamps)



##########
datafusion/functions/src/datetime/date_part.rs:
##########
@@ -237,6 +243,68 @@ impl ScalarUDFImpl for DatePartFunc {
         })
     }
 
+    // Only casting the year is supported since pruning other IntervalUnit is 
not possible
+    // date_part(col, YEAR) = 2024 => col >= '2024-01-01' and col < 
'2025-01-01'
+    // But for anything less than YEAR simplifying is not possible without 
specifying the bigger interval
+    // date_part(col, MONTH) = 1 => col = '2023-01-01' or col = '2024-01-01' 
or ... or col = '3000-01-01'
+    fn preimage(

Review Comment:
   I agree that the API docs is probably adequate. We could potentially add a 
note to `adding-udfs.md` that says something generic like "The ScalarUDFImpl 
has additional methods that support specialized optimizations such as 
`preimage` -- see the API documentation for additional details"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to