andygrove commented on code in PR #1443: URL: https://github.com/apache/datafusion-comet/pull/1443#discussion_r1996049247
########## native/core/src/execution/planner.rs: ########## @@ -181,6 +179,61 @@ impl PhysicalPlanner { } } + /// get DataFusion PartitionedFiles from a Spark FilePartition + fn get_partitioned_files( + &self, + partition: &SparkFilePartition, + ) -> Result<Vec<PartitionedFile>, ExecutionError> { + let mut files = Vec::with_capacity(partition.partitioned_file.len()); + partition.partitioned_file.iter().try_for_each(|file| { + assert!(file.start + file.length <= file.file_size); + + let mut partitioned_file = PartitionedFile::new_with_range( + String::new(), // Dummy file path. + file.file_size as u64, + file.start, + file.start + file.length, + ); + + // Spark sends the path over as URL-encoded, parse that first. + let url = Url::parse(file.file_path.as_ref()).unwrap(); + // Convert that to a Path object to use in the PartitionedFile. + let path = Path::from_url_path(url.path()).unwrap(); Review Comment: It would be nice to remove these unwraps and return Err. I know that the goal of this PR is to reorganize the code so we can look at removing unwraps as a separate PR if that is easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org