alamb commented on code in PR #14356:
URL: https://github.com/apache/datafusion/pull/14356#discussion_r1936081492


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2645,6 +2643,106 @@ pub struct Union {
     pub schema: DFSchemaRef,
 }
 
+impl Union {
+    /// Constructs new Union instance deriving schema from inputs.
+    fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> {
+        let schema = Self::derive_schema_from_inputs(&inputs, false)?;
+        Ok(Union { inputs, schema })
+    }
+
+    /// Constructs new Union instance deriving schema from inputs.
+    /// Inputs do not have to have matching types and produced schema will
+    /// take type from the first input.
+    pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) -> 
Result<Self> {

Review Comment:
   Another name for this one is  `try_new_with_coerce_types` to emphasize it is 
coercing the types



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -699,15 +699,13 @@ impl LogicalPlan {
                 }))
             }
             LogicalPlan::Union(Union { inputs, schema }) => {
-                let input_schema = inputs[0].schema();
-                // If inputs are not pruned do not change schema
-                // TODO this seems wrong (shouldn't we always use the schema 
of the input?)

Review Comment:
   👍 



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2645,6 +2643,106 @@ pub struct Union {
     pub schema: DFSchemaRef,
 }
 
+impl Union {
+    /// Constructs new Union instance deriving schema from inputs.
+    fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> {
+        let schema = Self::derive_schema_from_inputs(&inputs, false)?;
+        Ok(Union { inputs, schema })
+    }
+
+    /// Constructs new Union instance deriving schema from inputs.
+    /// Inputs do not have to have matching types and produced schema will
+    /// take type from the first input.
+    pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) -> 
Result<Self> {
+        let schema = Self::derive_schema_from_inputs(&inputs, true)?;
+        Ok(Union { inputs, schema })
+    }
+
+    /// Constructs new Union instance deriving schema from inputs.
+    ///
+    /// `loose_types` if true, inputs do not have to have matching types and 
produced schema will
+    /// take type from the first input. TODO this is not necessarily 
reasonable behavior.
+    fn derive_schema_from_inputs(
+        inputs: &[Arc<LogicalPlan>],
+        loose_types: bool,
+    ) -> Result<DFSchemaRef> {
+        if inputs.len() < 2 {

Review Comment:
   I think there is already code that computes the coerced schema in the 
analyzer: 
https://github.com/apache/datafusion/blob/c077ef5638c76cfca7af1967497aae5d2fd069a0/datafusion/optimizer/src/analyzer/type_coercion.rs#L912
   
   Can we reuse the same logic? Maybe we can move the coercion  code here



##########
datafusion/sqllogictest/test_files/union.slt:
##########
@@ -836,3 +836,18 @@ physical_plan
 # Clean up after the test
 statement ok
 drop table aggregate_test_100;
+
+# test for https://github.com/apache/datafusion/issues/14352

Review Comment:
   I verified this test fails without the code in this PR:
   
   ```diff
   External error: query result mismatch:
   [SQL] SELECT
       a,
       a IS NOT NULL
   FROM (
       -- second column, even though it's not selected, was necessary to 
reproduce the bug linked above
       SELECT 'foo' AS a, 3 AS b
       UNION ALL
       SELECT NULL AS a, 4 AS b
   )
   [Diff] (-expected|+actual)
   -   NULL false
   +   NULL true
       foo true
   at test_files/union.slt:841
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to