gabotechs commented on code in PR #22657:
URL: https://github.com/apache/datafusion/pull/22657#discussion_r3414491635
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -1286,6 +1288,80 @@ mod tests {
Ok(())
}
+ #[tokio::test]
+ async fn test_list_files_uses_declared_output_partitioning_count() ->
Result<()> {
+ let files = ["bucket/key-prefix/file0", "bucket/key-prefix/file1"];
+
+ let ctx = SessionContext::new();
+ register_test_store(&ctx, &files.iter().map(|f| (*f,
10)).collect::<Vec<_>>());
+
+ let opt = ListingOptions::new(Arc::new(JsonFormat::default()))
+ .with_file_extension_opt(Some(""))
+ .with_target_partitions(1)
+ .with_output_partitioning(Some(Partitioning::RoundRobinBatch(4)));
+
Review Comment:
🤔 Is it? I'm not convince this is very used, I'm actually surprised this
even exists here.
The "proper" way of telling DF how many partitions to use is with the
standard config parameter `datafusion.execution.target_partitions`. Given this,
I'm surprised there's this alternative path for choosing the partitions.
I'll dig in a bit more in the `git` history trying to get some context on
when was this creating and how it's expected to be used taking into account
that `datafusion.execution.target_partitions` already exists, maybe we find
that this is actually something that should have been deprecated a long time
ago.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]