kosiew opened a new pull request, #15295:
URL: https://github.com/apache/datafusion/pull/15295

   ## Which issue does this PR close?
   
   - Closes #14757.
   
   ## Rationale for this change
   
   [arrow-rs suggests that SchemaAdapter is better approach for handling 
evolving 
struct](https://github.com/apache/arrow-rs/issues/7176#issuecomment-2676805574).
   This change introduces support for evolving nested schemas in file-based 
data sources, particularly for Parquet. In many real-world data ingestion 
pipelines, schemas evolve over time — especially in nested fields — and systems 
need to be able to read historical and new data seamlessly. This patch provides 
infrastructure to adapt such evolving schemas dynamically without breaking 
query execution.
   
   ## What changes are included in this PR?
   
   - Introduced `NestedStructSchemaAdapter` and 
`NestedStructSchemaAdapterFactory` to handle schema evolution in nested fields.
   - Enhanced the `ListingTableConfig` and `ListingTable` to include and 
propagate an optional `schema_adapter_factory`.
   - Added logic in the physical plan creation to apply schema adapters to 
`FileSource` implementations like `ParquetSource`.
   - Ensured `ParquetFormat` respects and preserves schema adapter factories 
during physical plan creation.
   - Added helper function `preserve_schema_adapter_factory` to maintain schema 
adaptation context in `ParquetSource`.
   - Added comprehensive unit tests for nested schema adaptation, including:
     - Schema adaptation logic for nested structs.
     - Schema mapping and projection logic.
     - Record batch transformation with nested structs and missing fields.
   
   ## Are these changes tested?
   
   ✅ Yes.
   
   The patch includes extensive unit tests covering:
   
   - Basic and advanced nested struct adaptation.
   - Schema mapping consistency.
   - Record batch transformation.
   - Error cases and fallback behaviors.
   
   These tests ensure correct and predictable behavior when handling evolving 
nested schemas.
   
   ## Are there any user-facing changes?
   
   ✅ Yes, but non-breaking.
   
   - Users can now provide a `schema_adapter_factory` when constructing a 
`ListingTableConfig`.
   - This enables schema evolution support (including nested structs) for 
supported formats like Parquet.
   
   🔁 If no `schema_adapter_factory` is provided, behavior remains unchanged, 
ensuring backward compatibility.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to