kosiew opened a new pull request, #15295: URL: https://github.com/apache/datafusion/pull/15295
## Which issue does this PR close? - Closes #14757. ## Rationale for this change [arrow-rs suggests that SchemaAdapter is better approach for handling evolving struct](https://github.com/apache/arrow-rs/issues/7176#issuecomment-2676805574). This change introduces support for evolving nested schemas in file-based data sources, particularly for Parquet. In many real-world data ingestion pipelines, schemas evolve over time — especially in nested fields — and systems need to be able to read historical and new data seamlessly. This patch provides infrastructure to adapt such evolving schemas dynamically without breaking query execution. ## What changes are included in this PR? - Introduced `NestedStructSchemaAdapter` and `NestedStructSchemaAdapterFactory` to handle schema evolution in nested fields. - Enhanced the `ListingTableConfig` and `ListingTable` to include and propagate an optional `schema_adapter_factory`. - Added logic in the physical plan creation to apply schema adapters to `FileSource` implementations like `ParquetSource`. - Ensured `ParquetFormat` respects and preserves schema adapter factories during physical plan creation. - Added helper function `preserve_schema_adapter_factory` to maintain schema adaptation context in `ParquetSource`. - Added comprehensive unit tests for nested schema adaptation, including: - Schema adaptation logic for nested structs. - Schema mapping and projection logic. - Record batch transformation with nested structs and missing fields. ## Are these changes tested? ✅ Yes. The patch includes extensive unit tests covering: - Basic and advanced nested struct adaptation. - Schema mapping consistency. - Record batch transformation. - Error cases and fallback behaviors. These tests ensure correct and predictable behavior when handling evolving nested schemas. ## Are there any user-facing changes? ✅ Yes, but non-breaking. - Users can now provide a `schema_adapter_factory` when constructing a `ListingTableConfig`. - This enables schema evolution support (including nested structs) for supported formats like Parquet. 🔁 If no `schema_adapter_factory` is provided, behavior remains unchanged, ensuring backward compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org