TheBuilderJR commented on issue #2326:
URL: https://github.com/apache/datafusion/issues/2326#issuecomment-2287085675

   Right now datafusion doesn't support struct evolution very well. Imagine you 
have a struct named `customData` with field `someOptionEnabled` in one parquet 
file, later down the line you add a new field `newAddedOption` to the 
`customData` struct in another parquet file. Currently when you try and `SELECT 
* FROM table` you'll get this error:
   
   ```
   {"message":"Failed to collect DataFrame batches: Plan(\"Cannot cast file 
schema field customData of type Struct([Field { name: 
\\\"someOptionEnabled\\\", data_type: Boolean, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]) to table schema field of type 
Struct([Field { name: \\\"someOptionEnabled\\\", data_type: Boolean, nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
\\\"newAddedOption\\\", data_type: Float64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }])\")","status":"error"}
   ```
   
   Feels like we should handle this more gracefully. cc @alamb 
   
   I'm happy to make contributions if someone can point me to the right places 
to look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to