alamb commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1936081492
########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { + /// Constructs new Union instance deriving schema from inputs. + fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { + let schema = Self::derive_schema_from_inputs(&inputs, false)?; + Ok(Union { inputs, schema }) + } + + /// Constructs new Union instance deriving schema from inputs. + /// Inputs do not have to have matching types and produced schema will + /// take type from the first input. + pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { Review Comment: Another name for this one is `try_new_with_coerce_types` to emphasize it is coercing the types ########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -699,15 +699,13 @@ impl LogicalPlan { })) } LogicalPlan::Union(Union { inputs, schema }) => { - let input_schema = inputs[0].schema(); - // If inputs are not pruned do not change schema - // TODO this seems wrong (shouldn't we always use the schema of the input?) Review Comment: 👍 ########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { + /// Constructs new Union instance deriving schema from inputs. + fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { + let schema = Self::derive_schema_from_inputs(&inputs, false)?; + Ok(Union { inputs, schema }) + } + + /// Constructs new Union instance deriving schema from inputs. + /// Inputs do not have to have matching types and produced schema will + /// take type from the first input. + pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { + let schema = Self::derive_schema_from_inputs(&inputs, true)?; + Ok(Union { inputs, schema }) + } + + /// Constructs new Union instance deriving schema from inputs. + /// + /// `loose_types` if true, inputs do not have to have matching types and produced schema will + /// take type from the first input. TODO this is not necessarily reasonable behavior. + fn derive_schema_from_inputs( + inputs: &[Arc<LogicalPlan>], + loose_types: bool, + ) -> Result<DFSchemaRef> { + if inputs.len() < 2 { Review Comment: I think there is already code that computes the coerced schema in the analyzer: https://github.com/apache/datafusion/blob/c077ef5638c76cfca7af1967497aae5d2fd069a0/datafusion/optimizer/src/analyzer/type_coercion.rs#L912 Can we reuse the same logic? Maybe we can move the coercion code here ########## datafusion/sqllogictest/test_files/union.slt: ########## @@ -836,3 +836,18 @@ physical_plan # Clean up after the test statement ok drop table aggregate_test_100; + +# test for https://github.com/apache/datafusion/issues/14352 Review Comment: I verified this test fails without the code in this PR: ```diff External error: query result mismatch: [SQL] SELECT a, a IS NOT NULL FROM ( -- second column, even though it's not selected, was necessary to reproduce the bug linked above SELECT 'foo' AS a, 3 AS b UNION ALL SELECT NULL AS a, 4 AS b ) [Diff] (-expected|+actual) - NULL false + NULL true foo true at test_files/union.slt:841 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org