Re: [PR] Add declared file scan output partitioning [datafusion]

via GitHub Mon, 15 Jun 2026 04:41:17 -0700


gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3413159258



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1286,6 +1288,80 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn test_list_files_uses_declared_output_partitioning_count() -> 
Result<()> {
+        let files = ["bucket/key-prefix/file0", "bucket/key-prefix/file1"];
+
+        let ctx = SessionContext::new();
+        register_test_store(&ctx, &files.iter().map(|f| (*f, 
10)).collect::<Vec<_>>());
+
+        let opt = ListingOptions::new(Arc::new(JsonFormat::default()))
+            .with_file_extension_opt(Some(""))
+            .with_target_partitions(1)
+            .with_output_partitioning(Some(Partitioning::RoundRobinBatch(4)));
+

Review Comment:
    I agree this is a bit contradictory which is why I tried to make it as 
explicit as ossible in docs and tests. I don’t think mapping 
with_target_partitions to RoundRobinBatch is the best approach though since it 
kinda muddles parallelism with declaring round robin partitioning semantics.
   
   I am suggesting a good amount of follow-up in my comments I realize but I 
think it's the best way to handle these decisions so there is more dedicated 
thought to them. What would you think about for this PR I keepin the 
output_partitioning authoritative when set and that target_partitions is 
aligned in that case.
   
   Then I think deprecating or redesigning with_target_partitions is a 
discussion we can have in a n issue as its a pretty big knob people use.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add declared file scan output partitioning [datafusion]

Reply via email to