alamb opened a new issue, #12650: URL: https://github.com/apache/datafusion/issues/12650
### Is your feature request related to a problem or challenge? @ion-elgreco [asked in Discord](https://discord.com/channels/885562378132000778/1166447479609376850/1288733944530993194) > Does datafusion support a more relaxed Union where the schema can be in a different order? Akin to Polars.concat The documentation for [polars.concat](https://docs.pola.rs/api/python/stable/reference/api/polars.concat.html) [`DataFrame::union`](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.union) requires the inputs to have the same schema ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered # `Dataframe::concat` Add a `Dataframe::concat` method that works like this ```rust let df1 = ... ; // DataFrame with schema {a: int, b: string} let df2 = ...; // DataFrame with schema {b: string, a: int} let df3 = df1.concat(df2); // Dataframe with schema {a: int, b:string}, all rows from df1 before df2 ``` Implementing this might be somewhat complicated (as there is no existing LogicalPlan that could do this easily) One way to implement this could be something like add a fake column (`__child_number` perhaps) to df1 and df2 and have the plan be ```rust let df1 = df1.add_column('__child_number', 1); // add new __child_number column let df2 = df2.add_column('__child_number', 2); // add new __child_number column df3 = df1 .union_with_reorder(df2) // see below for union with reorder .order_by('__child_number') .project(...) // remove __child_number column ``` # `Dataframe::union_with_reorder_schema` ```rust let df1 = ... ; // DataFrame with schema {a: int, b: string} let df2 = ...; // DataFrame with schema {b: string, a: int} let df3 = df1.union_with_reorder_schema(df2); // Dataframe with schema {a: int, b:string}, rows from df1 and df2 interleaved (like union) ``` Could implement this with just a Projection that reordered the input schemas and then used existing Union # Change semantics of `DataFrame::union` to do reordering Another thing we could do is to change the semantics of Union to do the reordering, but that may have unintended consequences I don't think there is a dataframe level implementation of that functionality -- though I think it would be straightforward to add (the DataFrame could add a projection to the inputs to rearrange the column order ot match) ### Additional context We should double check with our dataframe exprts like @timsaucer and @Omega359 if this is a reasonable API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
