gene-bordegaray commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3413159258
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1286,6 +1288,80 @@ mod tests {
Ok(())
}
+ #[tokio::test]
+ async fn test_list_files_uses_declared_output_partitioning_count() ->
Result<()> {
+ let files = ["bucket/key-prefix/file0", "bucket/key-prefix/file1"];
+
+ let ctx = SessionContext::new();
+ register_test_store(&ctx, &files.iter().map(|f| (*f,
10)).collect::<Vec<_>>());
+
+ let opt = ListingOptions::new(Arc::new(JsonFormat::default()))
+ .with_file_extension_opt(Some(""))
+ .with_target_partitions(1)
+ .with_output_partitioning(Some(Partitioning::RoundRobinBatch(4)));
+
Review Comment:
I agree this is a bit contradictory which is why I tried to make it as
explicit as ossible in docs and tests. I don’t think mapping
with_target_partitions to RoundRobinBatch is the best approach though since it
kinda muddles parallelism with declaring round robin partitioning semantics.
I am suggesting a good amount of follow-up in my comments I realize but I
think it's the best way to handle these decisions so there is more dedicated
thought to them. What would you think about for this PR I keepin the
output_partitioning authoritative when set and that target_partitions is
aligned in that case.
Then I think deprecating or redesigning with_target_partitions is a
discussion we can have in a n issue as its a pretty big knob people use.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]