xudong963 commented on code in PR #15865:
URL: https://github.com/apache/datafusion/pull/15865#discussion_r2065855163
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1129,7 +1130,17 @@ impl ListingTable {
let (file_group, inexact_stats) =
get_files_with_limit(files, limit,
self.options.collect_stat).await?;
- let file_groups =
file_group.split_files(self.options.target_partitions);
+ let mut file_groups =
file_group.split_files(self.options.target_partitions);
+ let (schema_mapper, _) =
DefaultSchemaAdapterFactory::from_schema(self.schema())
Review Comment:
@alamb While I was working on
https://github.com/apache/datafusion/pull/15852, I found in fact, for listing
table, doesn't have the issue described in
https://github.com/apache/datafusion/issues/15689, that is, all files here have
the same schema because when creating table, all fetched files already use the
`SchemaMapper` to reorder their schema, see here:
https://github.com/apache/datafusion/blob/main/datafusion/datasource-parquet/src/opener.rs#L206.
What we should fix is let the file schema match the listing table schema,
usually, if users specify the partition col, table schema will have the extra
partition col infos, so I moved the mapper down the
`compute_all_files_statistics` method in the commit:
https://github.com/apache/datafusion/pull/15852/commits/689fc669c47581b86d6e4c12d73210f997c4cb10.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]