kosiew opened a new pull request, #16305: URL: https://github.com/apache/datafusion/pull/16305
## Which issue does this PR close? - Closes #16270 ## Rationale for this change The current behavior of `ListingTable` in DataFusion can produce inconsistent projected schemas depending on the order of input files, even when a schema is explicitly provided. This inconsistency is particularly problematic in use cases involving schema evolution or optional/nested fields. This PR introduces an explicit `SchemaSource` enum to track how a schema was derived—either `None`, `Inferred`, or `Specified`. This ensures that schema inference does not overwrite an explicitly provided schema, making `ListingTable` behavior predictable and robust across file order variations. ## What changes are included in this PR? - Introduced `SchemaSource` enum to track the origin of a schema. - Updated `ListingTableConfig` and `ListingTable` to store and respect `schema_source`. - Modified schema inference logic to retain specified schemas and only infer when none is provided. - Added methods to query the schema source from `ListingTable` and `ListingTableConfig`. - Extended existing and added new tests to verify: - Schema consistency regardless of file order. - Schema source tracking behavior across all config transformations. - Correct behavior with multi-file inputs and optional fields. ## Are these changes tested? Yes. Several comprehensive unit tests have been added to verify: - The schema source is correctly preserved through config operations. - `ListingTable` uses the explicitly provided schema instead of inferring from the first file. - Output schema remains consistent regardless of file order. - Inferred schema reflects the first file only when no schema is provided. ## Are there any user-facing changes? Yes, but they are non-breaking: - Users can now rely on `ListingTable` to respect explicitly provided schemas even when file contents vary. - Behavior is now deterministic across different file orderings. - Diagnostic capabilities are improved with access to the schema source via `ListingTable::schema_source()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org