liamphmurphy opened a new issue, #15338:
URL: https://github.com/apache/datafusion/issues/15338

   ### Describe the bug
   
   This bug for me originated when encountering schema evolutions on Delta 
tables using the `delta-rs` library. Whenever a schema evolution occurred on my 
table that contains a field with a list of structs, Datafusion is returning 
this error: 
   
   ```
   This feature is not implemented: Unsupported CAST from Struct([Field { name: 
"properties", data_type: Struct([Field { name: "someNewField", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: "fields", data_type: List(Field { name: "item", data_type: Struct([Field 
{ name: "messageId", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]) to Struct([Field { name: "properties", 
data_type: Struct([Field { name: "fields", data_type: List(Field { name: 
"element", data_type: Struct([Field { name: "messageId", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, 
dict_id: 0, dict_is_ordere
 d: false, metadata: {} }, Field { name: "someNewField", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }])
   ```
   
   ### To Reproduce
   
   Below is the python code using delta-rs (which is currently on Datafusion 
46) that shows this error
   
   ```python
   import pyarrow as pa
   from deltalake import write_deltalake
   
   # Define the path for the Delta table
   delta_table_path = "./datafusion-repro-test-table"
   
   # Define the data for the first write
   data_first_write = [
       {
           "uid": "ws_2",
           "event": {
               "properties": {
                   "fields": [
                       {
                           "messageId": "veniam sed et elit adipisicing"
                       }
                   ],
               },
           }
       }
   ]
   
   schema = pa.schema([
       pa.field("uid", pa.string()),
       pa.field("event", pa.struct([
           pa.field("properties", pa.struct([
               pa.field("fields", pa.list_(pa.struct([
                   pa.field("messageId", pa.string()),
               ]))),
           ])),
       ])),
   ])
   
   print(schema)
   
   
   
   first_write = pa.Table.from_pylist(data_first_write, schema=schema)
   
   # Write data to Delta table for the first write
   write_deltalake(delta_table_path, first_write, mode="append", engine="rust", 
schema_mode="merge")
   
   #### NOW FOR THE SECOND WRITE THAT BREAKS ####
   
   data_second_write = [
       {
           "uid": "ws_2",
           "event": {
               "properties": {
                   "someNewField": "test-value", # New field
                   "fields": [
                       {
                           "messageId": "veniam sed et elit adipisicing"
                       }
                   ],
               },
           }
       }
   ]
   
   second_schema = pa.schema([
       pa.field("uid", pa.string()),
       pa.field("event", pa.struct([
           pa.field("properties", pa.struct([
               pa.field("someNewField", pa.string()), # New field
               pa.field("fields", pa.list_(pa.struct([
                   pa.field("messageId", pa.string()),
               ]))),
           ])),
       ])),
   ])
   
   second_write = pa.Table.from_pylist(data_second_write, schema=second_schema)
   
   # Write data to Delta table for the second write
   write_deltalake(delta_table_path, second_write, mode="append", 
engine="rust", schema_mode="merge")
   ```
   
   ### Expected behavior
   
   Datafusion would support casting a schema when said schema contains a list 
of structs.
   
   ### Additional context
   
   Originating bug report in delta-rs: 
https://github.com/delta-io/delta-rs/issues/3339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to