jcsherin commented on code in PR #12811:
URL: https://github.com/apache/datafusion/pull/12811#discussion_r1799370878
##########
datafusion/physical-plan/src/windows/mod.rs:
##########
@@ -257,31 +257,55 @@ fn create_built_in_window_expr(
}
}
BuiltInWindowFunction::Lag => {
- let arg = Arc::clone(&args[0]);
+ let mut arg = Arc::clone(&args[0]);
let shift_offset = get_scalar_value_from_args(args, 1)?
.map(get_signed_integer)
.map_or(Ok(None), |v| v.map(Some))?;
- let default_value =
- get_casted_value(get_scalar_value_from_args(args, 2)?,
out_data_type)?;
+ // If value is NULL, we use default data type as output data type,
no need to cast data type
+ let default_value = match out_data_type {
+ DataType::Null => match get_scalar_value_from_args(args, 2)? {
+ Some(value) => {
+ let null_value =
ScalarValue::try_from(value.data_type())?;
+ arg = Arc::new(Literal::new(null_value));
+ value
+ }
+ None => ScalarValue::try_from(DataType::Null)?,
+ },
+ _ => {
+ get_casted_value(get_scalar_value_from_args(args, 2)?,
out_data_type)?
+ }
+ };
Review Comment:
I think concerns got intertwined here making this code a bit hard to read.
Tbh this bit of code was hard to understand even in the absence of this fix 😅.
I know because I had some difficult in correctly moving this code for
parsing the window function arguments when converting it to a `WindowUDF`. See
#12857.
From my understanding there are two things happening here,
1. When `NULL` is passed as the expression (1st argument) we need to coerce
it to be the type of the default value(if one is provided).
2. Like you rightly pointed out in the comments, there is no need to cast
the default value to type of the 1st argument when it is `NULL`.
There is a dependency on the default value(3rd argument) in both of the
cases.
We can compute them separately like this,
```suggestion
let arg = coerce_default_type_if_null(args, out_data_type);
let default_value =
get_casted_value(get_scalar_value_from_args(args, 2)?,
out_data_type)?;
```
In the first part, we coerce `NULL` to `DataType` of default value when one
is provided,
```rust
/// For lead/lag window functions, when the expression (first argument)
/// is NULL, interpret it as a missing value which has the same type as
/// the default value.
///
/// See https://github.com/apache/datafusion/issues/12717
fn coerce_default_type_if_null(
args: &[Arc<dyn PhysicalExpr>],
expr_type: &DataType,
) -> Arc<dyn PhysicalExpr> {
let expr = Arc::clone(&args[0]);
let default_value = get_scalar_value_from_args(args, 2);
if !expr_type.is_null() {
expr
} else {
default_value
.unwrap()
.and_then(|value| {
ScalarValue::try_from(value.data_type().clone())
.map(|sv| Arc::new(Literal::new(sv)) as Arc<dyn
PhysicalExpr>)
.ok()
})
.unwrap_or(expr)
}
}
```
And in the second part we push the null check into `get_casted_value`
function. This benefits both lead/lag as they are the only users of
`get_casted_value`. It looks like this,
```rust
/// For lead/lag window functions, the types of the 1st and 3rd
/// argument should match. So here we attempt to cast the default
/// value to the same type as the 1st argument.
///
/// This also handles the case where the 1st argument could be
/// a NULL value in which case avoid casting the 3rd argument.
///
/// See https://github.com/apache/datafusion/issues/12717
fn get_casted_value(
default_value: Option<ScalarValue>,
dtype: &DataType,
) -> Result<ScalarValue> {
match (dtype.is_null(), &default_value) {
(true, _) => Ok(default_value.unwrap_or(ScalarValue::Null)),
(false, None) => ScalarValue::try_from(dtype.clone()),
(false, Some(v)) => {
if !v.data_type().is_null() {
v.cast_to(dtype)
} else {
ScalarValue::try_from(dtype.clone())
}
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]