alamb commented on code in PR #11989:
URL: https://github.com/apache/datafusion/pull/11989#discussion_r1719999132
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -328,10 +328,45 @@ impl ExprSchemable for Expr {
Ok(true)
}
}
+ Expr::WindowFunction(WindowFunction { fun, .. }) => {
+ match fun {
+ WindowFunctionDefinition::BuiltInWindowFunction(func) => {
+ if func.name() == "ROW_NUMBER"
+ || func.name() == "RANK"
+ || func.name() == "NTILE"
+ || func.name() == "CUME_DIST"
+ {
+ Ok(false)
+ } else {
+ Ok(true)
+ }
+ }
+ WindowFunctionDefinition::AggregateUDF(func) => {
+ // TODO: UDF should be able to customize nullability
+ if func.name() == "count" {
+ // TODO: there is issue unsolved for count with
window, should return false
+ Ok(true)
+ } else {
+ Ok(true)
+ }
+ }
+ _ => Ok(true),
+ }
+ }
+ Expr::ScalarFunction(ScalarFunction { func, args }) => {
+ // If all the element in coalesce is non-null, the result is
non-null
Review Comment:
We should probably add an API to ScalarUDFImpl to signal its
null/non-nullness (as a follow on PR) instead of hard coding this function name
```
func.is_nullable(args)
```
##########
datafusion/functions-aggregate-common/src/aggregate.rs:
##########
@@ -171,6 +171,9 @@ pub trait AggregateExpr: Send + Sync + Debug +
PartialEq<dyn Any> {
fn get_minmax_desc(&self) -> Option<(Field, bool)> {
None
}
+
+ /// Get function's name, for example `count(x)` returns `count`
+ fn func_name(&self) -> &str;
Review Comment:
is there a reason this isn't `name()` ? `func_name` is fine, it just seems
inconsistent with the rest of the code
##########
datafusion/expr/src/udaf.rs:
##########
@@ -196,6 +196,10 @@ impl AggregateUDF {
self.inner.state_fields(args)
}
+ pub fn fields(&self, args: StateFieldsArgs) -> Result<Field> {
Review Comment:
Could we document this function and what it is for (also in
AggregateUdfImpl)?
Also, the name is strange to me -- it is `fields` but it returns a single
`Field` and the corresponding function on `AggregateUDFImpl` is called `field`
(no `s`) 🤔
##########
datafusion/optimizer/src/analyzer/type_coercion.rs:
##########
@@ -833,39 +841,47 @@ fn coerce_union_schema(inputs: &[Arc<LogicalPlan>]) ->
Result<DFSchema> {
plan_schema.fields().len()
);
}
- // coerce data type and nullablity for each field
- for (union_datatype, union_nullable, plan_field) in izip!(
- union_datatypes.iter_mut(),
- union_nullabilities.iter_mut(),
- plan_schema.fields()
- ) {
- let coerced_type =
- comparison_coercion(union_datatype,
plan_field.data_type()).ok_or_else(
- || {
- plan_datafusion_err!(
- "Incompatible inputs for Union: Previous inputs
were \
- of type {}, but got incompatible type {} on column
'{}'",
- union_datatype,
- plan_field.data_type(),
- plan_field.name()
- )
- },
- )?;
- *union_datatype = coerced_type;
- *union_nullable = *union_nullable || plan_field.is_nullable();
+
+ // Safety: Length is checked
+ unsafe {
Review Comment:
I think this unsafe block is unecessary -- this isn't a performance
critical piece of code. I think `izip` or just manuallly `zip`ping three times
would be better
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -328,10 +328,45 @@ impl ExprSchemable for Expr {
Ok(true)
}
}
+ Expr::WindowFunction(WindowFunction { fun, .. }) => {
Review Comment:
Is this change required for this PR or is it a "drive by" improvement?
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -670,6 +670,10 @@ impl DefaultPhysicalPlanner {
let input_exec = children.one()?;
let physical_input_schema = input_exec.schema();
let logical_input_schema = input.as_ref().schema();
+ let physical_input_schema_from_logical: Arc<Schema> =
+ logical_input_schema.as_ref().clone().into();
+
+ debug_assert_eq!(physical_input_schema_from_logical,
physical_input_schema, "Physical input schema should be the same as the one
converted from logical input schema. Please file an issue or send the PR");
Review Comment:
Nice!
Did you consider making this function return an `internal_error` rather than
debug_assert ?
If we are concerned about breaking existing tests, we could add a config
setting like `datafusion.optimizer.skip_failed_rules` to let users bypass the
check
##########
datafusion/physical-expr/src/window/built_in.rs:
##########
@@ -97,6 +97,10 @@ impl BuiltInWindowExpr {
}
impl WindowExpr for BuiltInWindowExpr {
+ fn func_name(&self) -> Result<&str> {
+ not_impl_err!("function name not determined")
Review Comment:
why wouldn't we implement func_name for a built in window function 🤔
##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -328,10 +328,45 @@ impl ExprSchemable for Expr {
Ok(true)
}
}
+ Expr::WindowFunction(WindowFunction { fun, .. }) => {
+ match fun {
+ WindowFunctionDefinition::BuiltInWindowFunction(func) => {
+ if func.name() == "ROW_NUMBER"
+ || func.name() == "RANK"
+ || func.name() == "NTILE"
+ || func.name() == "CUME_DIST"
+ {
+ Ok(false)
+ } else {
+ Ok(true)
+ }
+ }
+ WindowFunctionDefinition::AggregateUDF(func) => {
+ // TODO: UDF should be able to customize nullability
+ if func.name() == "count" {
+ // TODO: there is issue unsolved for count with
window, should return false
Review Comment:
Perhaps we can file a ticket to track this -- ideally it would eventually be
part of the window function definition itself rather than relying on names
##########
datafusion/physical-expr/src/window/aggregate.rs:
##########
@@ -80,6 +80,14 @@ impl WindowExpr for PlainAggregateWindowExpr {
}
fn field(&self) -> Result<Field> {
+ // TODO: Fix window function to always return non-null for count
Review Comment:
I don't understand this comment -- can we please file a ticket to track it
(and add the ticket reference to the comments)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]