neilconway commented on code in PR #21715:
URL: https://github.com/apache/datafusion/pull/21715#discussion_r3255319904
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2434,7 +2437,14 @@ impl SubqueryAlias {
// Requalify fields with the new `alias`.
let fields = plan.schema().fields().clone();
let meta_data = plan.schema().metadata().clone();
- let func_dependencies =
plan.schema().functional_dependencies().clone();
+ // Recursive queries do not expose the anchor's functional
dependencies to
+ // the outer schema — the recursive term can produce rows that violate
+ // those dependencies, so they are intentionally dropped here.
+ let func_dependencies = if is_recursive_query {
+ FunctionalDependencies::empty()
+ } else {
+ plan.schema().functional_dependencies().clone()
+ };
Review Comment:
Can we mention this change in the PR description?
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -354,7 +354,6 @@ impl LogicalPlan {
LogicalPlan::Ddl(ddl) => ddl.schema(),
LogicalPlan::Unnest(Unnest { schema, .. }) => schema,
LogicalPlan::RecursiveQuery(RecursiveQuery { static_term, .. }) =>
{
- // we take the schema of the static term as the schema of the
entire recursive query
Review Comment:
Why remove this?
##########
datafusion/sqllogictest/test_files/group_by.slt:
##########
@@ -3565,33 +3565,27 @@ SELECT r.sn, r.amount, SUM(r.amount)
GROUP BY r.sn
ORDER BY r.sn
-# left semi join should propagate constraint of left side as is.
-query IRR
+# left semi join with a nullable UNIQUE key cannot safely propagate the
Review Comment:
I wonder if it would be better to mark the UNIQUE column as `NOT NULL`, so
we don't lose the intent of the original test. (Here and below)
##########
datafusion/common/src/functional_dependencies.rs:
##########
Review Comment:
Seems like `Constraint::Unique` can produce non-nullable FDs now, so we
should update this comment?
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2434,7 +2437,14 @@ impl SubqueryAlias {
// Requalify fields with the new `alias`.
let fields = plan.schema().fields().clone();
let meta_data = plan.schema().metadata().clone();
- let func_dependencies =
plan.schema().functional_dependencies().clone();
+ // Recursive queries do not expose the anchor's functional
dependencies to
+ // the outer schema — the recursive term can produce rows that violate
+ // those dependencies, so they are intentionally dropped here.
Review Comment:
Seems a bit fragile to leave FDs on `plan.schema()` but only strip them out
here when they are being used. Should there be FDs on `plan.schema()` in the
first place?
BTW this might be affected by / related to #22037
##########
datafusion/common/src/functional_dependencies.rs:
##########
@@ -484,9 +505,12 @@ pub fn aggregate_functional_dependencies(
if !group_by_expr_names.is_empty() {
let count = group_by_expr_names.len();
let source_indices = (0..count).collect::<Vec<_>>();
- let nullable = source_indices
- .iter()
- .any(|idx| aggr_fields[*idx].is_nullable());
+ // Aggregation with GROUP BY always produces unique output rows for
+ // each distinct combination of GROUP BY keys. The nullable flag is
+ // set to false here so that subsequent expansion (e.g. a second
+ // GROUP BY on the aggregate output) is never blocked by source
+ // field nullability.
+ let nullable = false;
Review Comment:
I'm trying to understand the motivation for this change. Can you add a test
case that covers this behavior, perhaps?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]