alamb commented on code in PR #14356:
URL: https://github.com/apache/datafusion/pull/14356#discussion_r1936081492
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2645,6 +2643,106 @@ pub struct Union {
pub schema: DFSchemaRef,
}
+impl Union {
+ /// Constructs new Union instance deriving schema from inputs.
+ fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> {
+ let schema = Self::derive_schema_from_inputs(&inputs, false)?;
+ Ok(Union { inputs, schema })
+ }
+
+ /// Constructs new Union instance deriving schema from inputs.
+ /// Inputs do not have to have matching types and produced schema will
+ /// take type from the first input.
+ pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) ->
Result<Self> {
Review Comment:
Another name for this one is `try_new_with_coerce_types` to emphasize it is
coercing the types
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -699,15 +699,13 @@ impl LogicalPlan {
}))
}
LogicalPlan::Union(Union { inputs, schema }) => {
- let input_schema = inputs[0].schema();
- // If inputs are not pruned do not change schema
- // TODO this seems wrong (shouldn't we always use the schema
of the input?)
Review Comment:
👍
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2645,6 +2643,106 @@ pub struct Union {
pub schema: DFSchemaRef,
}
+impl Union {
+ /// Constructs new Union instance deriving schema from inputs.
+ fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> {
+ let schema = Self::derive_schema_from_inputs(&inputs, false)?;
+ Ok(Union { inputs, schema })
+ }
+
+ /// Constructs new Union instance deriving schema from inputs.
+ /// Inputs do not have to have matching types and produced schema will
+ /// take type from the first input.
+ pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) ->
Result<Self> {
+ let schema = Self::derive_schema_from_inputs(&inputs, true)?;
+ Ok(Union { inputs, schema })
+ }
+
+ /// Constructs new Union instance deriving schema from inputs.
+ ///
+ /// `loose_types` if true, inputs do not have to have matching types and
produced schema will
+ /// take type from the first input. TODO this is not necessarily
reasonable behavior.
+ fn derive_schema_from_inputs(
+ inputs: &[Arc<LogicalPlan>],
+ loose_types: bool,
+ ) -> Result<DFSchemaRef> {
+ if inputs.len() < 2 {
Review Comment:
I think there is already code that computes the coerced schema in the
analyzer:
https://github.com/apache/datafusion/blob/c077ef5638c76cfca7af1967497aae5d2fd069a0/datafusion/optimizer/src/analyzer/type_coercion.rs#L912
Can we reuse the same logic? Maybe we can move the coercion code here
##########
datafusion/sqllogictest/test_files/union.slt:
##########
@@ -836,3 +836,18 @@ physical_plan
# Clean up after the test
statement ok
drop table aggregate_test_100;
+
+# test for https://github.com/apache/datafusion/issues/14352
Review Comment:
I verified this test fails without the code in this PR:
```diff
External error: query result mismatch:
[SQL] SELECT
a,
a IS NOT NULL
FROM (
-- second column, even though it's not selected, was necessary to
reproduce the bug linked above
SELECT 'foo' AS a, 3 AS b
UNION ALL
SELECT NULL AS a, 4 AS b
)
[Diff] (-expected|+actual)
- NULL false
+ NULL true
foo true
at test_files/union.slt:841
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]