alamb opened a new issue, #12650:
URL: https://github.com/apache/datafusion/issues/12650

   ### Is your feature request related to a problem or challenge?
   
   @ion-elgreco [asked in 
Discord](https://discord.com/channels/885562378132000778/1166447479609376850/1288733944530993194)
   
   > Does datafusion support a more relaxed Union where the schema can be in a 
different order? Akin to Polars.concat
   
   The documentation for 
[polars.concat](https://docs.pola.rs/api/python/stable/reference/api/polars.concat.html)
   
   
[`DataFrame::union`](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.union)
 requires the inputs to have the same schema
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   # `Dataframe::concat` 
   
   Add a `Dataframe::concat` method  that works like this
   ```rust
     let df1 = ... ; // DataFrame with schema {a: int, b: string}
     let df2 = ...; // DataFrame with schema {b: string, a: int}
     let df3 = df1.concat(df2); // Dataframe with schema {a: int, b:string}, 
all rows from df1 before df2
   ```
   
   
   Implementing this might be somewhat complicated (as there is no existing 
LogicalPlan that could do this easily)
   
   One way to implement this could be something like add a fake column 
(`__child_number` perhaps) to df1 and df2 and have the plan be
   
   ```rust
   let df1 = df1.add_column('__child_number', 1); // add new __child_number 
column
   let df2 = df2.add_column('__child_number', 2); // add new __child_number 
column
   df3 = df1
     .union_with_reorder(df2) // see below for union with reorder
     .order_by('__child_number')
     .project(...) // remove __child_number column
   ```
   
   
   # `Dataframe::union_with_reorder_schema` 
   
   ```rust
     let df1 = ... ; // DataFrame with schema {a: int, b: string}
     let df2 = ...; // DataFrame with schema {b: string, a: int}
     let df3 = df1.union_with_reorder_schema(df2); // Dataframe with schema {a: 
int, b:string}, rows from df1 and df2 interleaved (like union)
   ```
   
   Could implement this with just a Projection that reordered the input schemas 
and then used existing Union
   
   
   # Change semantics of `DataFrame::union` to do reordering
   Another thing we could do is to change the semantics of Union to do the 
reordering, but that may have unintended consequences
   
   I don't think there is a dataframe level implementation of that 
functionality -- though I think it would be straightforward to add (the 
DataFrame could add a projection to the inputs to rearrange the column order ot 
match)
   
   
   ### Additional context
   
   We should double check with our dataframe exprts like @timsaucer  and 
@Omega359  if this is a reasonable API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to