Re: [PR] fix: Ignore empty files in ListingTable when listing files with or without partition filters, as well as when inferring schema [datafusion]

via GitHub Sun, 15 Dec 2024 13:38:43 -0800


alamb commented on code in PR #13750:
URL: https://github.com/apache/datafusion/pull/13750#discussion_r1885873214



##########
datafusion/core/src/datasource/file_format/csv.rs:
##########
@@ -1259,73 +1259,57 @@ mod tests {
         Ok(())
     }
 
-    /// Read a single empty csv file in parallel
+    /// Read a single empty csv file
     ///
     /// empty_0_byte.csv:
     /// (file is empty)
-    #[rstest(n_partitions, case(1), case(2), case(3), case(4))]
     #[tokio::test]
-    async fn test_csv_parallel_empty_file(n_partitions: usize) -> Result<()> {

Review Comment:
   I did some research and found it seems to have been added in 
https://github.com/apache/datafusion/pull/6801 by @2010YOUY01 . As long as the 
code works with empty files (aka doesn't throw an error / go into a infinite 
loop) I think we are good
   
   Thus I suggest leaving at least one test where we set the repartition file 
sizes/min file size to 0 and make sure nothing bad happens



##########
datafusion/core/src/datasource/listing/helpers.rs:
##########
@@ -671,6 +680,106 @@ mod tests {
         );
     }
 
+    fn describe_partition(partition: &Partition) -> (&str, usize, Vec<&str>) {

Review Comment:
   a comment here might be nice explaining what the str/usize/Vec<&str> means 
for future readers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Ignore empty files in ListingTable when listing files with or without partition filters, as well as when inferring schema [datafusion]

Reply via email to