alamb opened a new issue, #13065:
URL: https://github.com/apache/datafusion/issues/13065

   ### Is your feature request related to a problem or challenge?
   
   This bug, released in DataFusion 42.0.0 , 
   - https://github.com/apache/datafusion/pull/11989
   
   Added a new check in the DefaultPhysicalPlanner that the schema of the 
output plan is the same as the input plan
   
   
https://github.com/apache/datafusion/blob/818ce3f01efe1213a9a1eda5dff1542bb9d457f7/datafusion/core/src/physical_planner.rs#L660-L662
   
   While @jayzhan211 's heroic efforts has this passing in all the DataFusion 
tests, it turned out this check failed on many downstream implementations:
   - https://github.com/apache/datafusion/issues/12733 during our upgrade in 
InfluxDB 3.0
   -  https://github.com/apache/datafusion/issues/12733 @ion-elgreco and 
@rtyler and @Xuanwo  while updating delta.rs to DataFusion 42.0.0
   
   Downstream in InfluxDB 3.0 we turned the check into a warning in our fork to 
unblock our upgrade
   
   We even made a patch release to try and get the delta-rs upgrade working:
   - https://github.com/apache/datafusion/issues/12813
   
   But it is still failing when I write this (see 
https://github.com/delta-io/delta-rs/pull/2886#issuecomment-2425616646)
   
   > Internal error: Failed due to a difference in schemas, original schema: 
DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
"price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { 
name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, 
metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
"__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, 
dict_is_orde
 red: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, 
Some(Bare { table: "target" }), None, None, None, None, None], 
functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: 
DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: 
"price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { 
name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, 
metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, 
dict_id: 0, dict_is_ordered
 : false, metadata: {} }, Field { name: "__delta_rs_update_predicate", 
data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, 
metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: 
"target" }), None, None, None, None, None], functional_dependencies: 
FunctionalDependencies { deps: [] } }.
   
   ### Describe the solution you'd like
   
   I would like some way to disable this check to unblock upgrades in 
downstream crates. 
   
   ### Describe alternatives you've considered
   
   I propose we add a new config value that lets downstream crates opt in / out 
of this check, similarly to `datafusion.optimizer.skip_failed_rules` (see 
[Config Docs](https://datafusion.apache.org/user-guide/configs.html))
   
   Something like:
   * `datafusion.execution.validate_schema`: If true, the 
`DefaultPhysicalPlanner` will error if the input plan's schema does not exactly 
match the output plan. 
   
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to