findepi commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1937072353
########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { + /// Constructs new Union instance deriving schema from inputs. + fn try_new(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { + let schema = Self::derive_schema_from_inputs(&inputs, false)?; + Ok(Union { inputs, schema }) + } + + /// Constructs new Union instance deriving schema from inputs. + /// Inputs do not have to have matching types and produced schema will + /// take type from the first input. + pub fn try_new_with_loose_types(inputs: Vec<Arc<LogicalPlan>>) -> Result<Self> { + let schema = Self::derive_schema_from_inputs(&inputs, true)?; + Ok(Union { inputs, schema }) + } + + /// Constructs new Union instance deriving schema from inputs. + /// + /// `loose_types` if true, inputs do not have to have matching types and produced schema will + /// take type from the first input. TODO this is not necessarily reasonable behavior. + fn derive_schema_from_inputs( + inputs: &[Arc<LogicalPlan>], + loose_types: bool, + ) -> Result<DFSchemaRef> { + if inputs.len() < 2 { Review Comment: > f I want to construct a UNION logical plan with different types that are coercible (be it by current builtin rules or future user-defined rules), then I would use the `Union::try_new_with_loose_types` and have the analyzer pass handle coercion. Is this right? Correct. Note: this is not my design. It was exactly the same before the PR. Just the code moved around. > Then what exactly is the use case for the Union::try_new? Since it's used in the schema recompute which can occur after the analyzer type coercion. "schema recompute" is an overloaded term if it runs after analyzer, it doesn't have to do any type coercion. In fact, it MUST NOT do any type coercion (https://github.com/apache/datafusion/issues/14296#issuecomment-2625773366). And in fact the `try_new` does not do any coercions. It's still needed to do column pruning. In fact, IMO we should remove "schema recompute" from optimizer: https://github.com/apache/datafusion/issues/14357. For column pruning we should explicitly prune inputs of union and the unin itself using the same set of "required columns/indices". No need for a generic "recompute schema". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org