alamb opened a new issue, #13065: URL: https://github.com/apache/datafusion/issues/13065
### Is your feature request related to a problem or challenge? This bug, released in DataFusion 42.0.0 , - https://github.com/apache/datafusion/pull/11989 Added a new check in the DefaultPhysicalPlanner that the schema of the output plan is the same as the input plan https://github.com/apache/datafusion/blob/818ce3f01efe1213a9a1eda5dff1542bb9d457f7/datafusion/core/src/physical_planner.rs#L660-L662 While @jayzhan211 's heroic efforts has this passing in all the DataFusion tests, it turned out this check failed on many downstream implementations: - https://github.com/apache/datafusion/issues/12733 during our upgrade in InfluxDB 3.0 - https://github.com/apache/datafusion/issues/12733 @ion-elgreco and @rtyler and @Xuanwo while updating delta.rs to DataFusion 42.0.0 Downstream in InfluxDB 3.0 we turned the check into a warning in our fork to unblock our upgrade We even made a patch release to try and get the delta-rs upgrade working: - https://github.com/apache/datafusion/issues/12813 But it is still failing when I write this (see https://github.com/delta-io/delta-rs/pull/2886#issuecomment-2425616646) > Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_orde red: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered : false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }. ### Describe the solution you'd like I would like some way to disable this check to unblock upgrades in downstream crates. ### Describe alternatives you've considered I propose we add a new config value that lets downstream crates opt in / out of this check, similarly to `datafusion.optimizer.skip_failed_rules` (see [Config Docs](https://datafusion.apache.org/user-guide/configs.html)) Something like: * `datafusion.execution.validate_schema`: If true, the `DefaultPhysicalPlanner` will error if the input plan's schema does not exactly match the output plan. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
