xudong963 commented on code in PR #16133: URL: https://github.com/apache/datafusion/pull/16133#discussion_r2102026730
########## datafusion/core/src/datasource/physical_plan/parquet.rs: ########## @@ -200,26 +210,43 @@ mod tests { /// run the test, returning the `RoundTripResult` async fn round_trip(&self, batches: Vec<RecordBatch>) -> RoundTripResult { - let file_schema = match &self.schema { + self.round_trip_with_file_batches(batches, None).await + } + + /// run the test, returning the `RoundTripResult` + /// If your table schema is different from file schema, you may need to specify the `file_batches` with the file schema + /// Or the file schema in the parquet source will be table schema, see `store_parquet` for detail + async fn round_trip_with_file_batches( + &self, + batches: Vec<RecordBatch>, + file_batches: Option<Vec<RecordBatch>>, + ) -> RoundTripResult { + let batches_schema = + Schema::try_merge(batches.iter().map(|b| b.schema().as_ref().clone())); + let file_schema = match &self.physical_file_schema { Some(schema) => schema, - None => &Arc::new( - Schema::try_merge( - batches.iter().map(|b| b.schema().as_ref().clone()), - ) - .unwrap(), - ), + None => &Arc::new(batches_schema.as_ref().unwrap().clone()), }; let file_schema = Arc::clone(file_schema); + let table_schema = match &self.logical_file_schema { + Some(schema) => schema, + None => &Arc::new(batches_schema.as_ref().unwrap().clone()), + }; + // If testing with page_index_predicate, write parquet // files with multiple pages let multi_page = self.page_index_predicate; - let (meta, _files) = store_parquet(batches, multi_page).await.unwrap(); Review Comment: The original code uses `batches` to write parquet files, and the physical file schema used in the parquet source will be table schema(logical file schema), so the tests in https://github.com/apache/datafusion/pull/16086 may be meaningless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org