kosiew opened a new pull request, #16371: URL: https://github.com/apache/datafusion/pull/16371
## Which issue does this PR close? This is the last of a series of PRs re-implementing #15295 to close #14657 by adding schema‐evolution support for: - listing‐based tables - with nested structs in DataFusion. - Closes #14757 ## Rationale for this change This change enables DataFusion's listing-based tables to support schema evolution when dealing with files that may have nested struct fields with varying structures over time. It ensures more robust data ingestion pipelines, especially in environments where schema drift is common (e.g., data lakes, log-based ingestion, etc.). Previously, nested structs with evolved schemas could lead to incompatibility errors or data loss. This PR introduces a flexible schema adaptation mechanism through the `SchemaAdapterFactory` trait, allowing custom logic to map differing schemas safely and correctly. ## What changes are included in this PR? - Introduced a `schema_adapter_factory` field to `ListingTableConfig` and `ListingTable`. - Added support for injecting custom `SchemaAdapterFactory` implementations. - Implemented a `NestedStructSchemaAdapterFactory` that can handle nested structs and evolve them by injecting nulls for missing nested fields. - Integrated the factory into the listing table execution path (scan, statistics, file listing). - Updated default behavior to use `DefaultSchemaAdapterFactory` if none is provided. - Added comprehensive tests covering: - Adapter selection - Mapping of nested structs - Error propagation for incompatible schemas - Column statistics transformation through the adapter ## Are these changes tested? ✅ Yes, the PR includes extensive unit tests that verify: - Behavior of schema adapter factories under different schema conditions - Handling of missing nested fields - Adaptation logic for struct arrays - Mapping and transformation of column statistics - Error propagation when schema adaptation fails ## Are there any user-facing changes? ✅ Yes, this PR introduces the ability to: - Provide custom schema adaptation logic to `ListingTable` through `ListingTableConfig::with_schema_adapter_factory` - Seamlessly read and evolve files with changing nested struct schemas There are no breaking changes to public APIs. The added functionality is optional and backward-compatible with existing behavior. <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org