alamb commented on code in PR #19733:
URL: https://github.com/apache/datafusion/pull/19733#discussion_r2738697071
##########
datafusion/sqllogictest/test_files/udf_preimage.slt:
##########
@@ -0,0 +1,545 @@
+# Licensed to the Apache Software Foundation (ASF) under one
Review Comment:
I recommend adding this date_part.slt to keep the code for date_part together
##########
datafusion/functions/src/datetime/date_part.rs:
##########
@@ -251,6 +321,56 @@ fn is_epoch(part: &str) -> bool {
matches!(part.to_lowercase().as_str(), "epoch")
}
+fn date_to_scalar(date: NaiveDate, target_type: &DataType) ->
Option<ScalarValue> {
+ let days = date
+ .signed_duration_since(NaiveDate::from_epoch_days(0)?)
+ .num_days();
+
+ Some(match target_type {
+ Date32 => ScalarValue::Date32(Some(days as i32)),
+ Date64 => ScalarValue::Date64(Some(days * MILLISECONDS_IN_DAY)),
+
+ Timestamp(unit, tz_opt) => {
Review Comment:
It feels to me like this code should be able to re-use the conversion
functions in arrow rather than re-implementing them here
For example
```rust
Date32 =>
ScalarValue::Date32(Some(Date32Type::from_naive_date(date))),
Date64 =>
ScalarValue::Date64(Some(Date64Type::from_naive_date(date))),
...
```
I didn't have a chance to figure out how to do it for Timestamp, but it
seems like there should be a function like this for the timestamps too -- for
example
https://docs.rs/arrow/latest/arrow/array/types/struct.TimestampSecondType.html
and https://docs.rs/arrow/latest/arrow/datatypes/trait.ArrowTimestampType.html
Maybe something like
```rust
TimestampSecondType::make_value(date)
```
(We will have to figure out timestamps)
##########
datafusion/functions/src/datetime/date_part.rs:
##########
@@ -237,6 +243,68 @@ impl ScalarUDFImpl for DatePartFunc {
})
}
+ // Only casting the year is supported since pruning other IntervalUnit is
not possible
+ // date_part(col, YEAR) = 2024 => col >= '2024-01-01' and col <
'2025-01-01'
+ // But for anything less than YEAR simplifying is not possible without
specifying the bigger interval
+ // date_part(col, MONTH) = 1 => col = '2023-01-01' or col = '2024-01-01'
or ... or col = '3000-01-01'
+ fn preimage(
Review Comment:
I agree that the API docs is probably adequate. We could potentially add a
note to `adding-udfs.md` that says something generic like "The ScalarUDFImpl
has additional methods that support specialized optimizations such as
`preimage` -- see the API documentation for additional details"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]