alamb commented on code in PR #14754:
URL: https://github.com/apache/datafusion/pull/14754#discussion_r1963520316


##########
datafusion/core/src/datasource/data_source.rs:
##########
@@ -67,4 +69,33 @@ pub trait FileSource: Send + Sync {
     /// If this returns true, the DataSourceExec may repartition the data
     /// by breaking up the input files into multiple smaller groups.
     fn supports_repartition(&self, config: &FileScanConfig) -> bool;

Review Comment:
   It seems like we could remove `supports_repartition` as well:
   1. This API hasn't yet been released so it wouldn't be a breaking change
   2. It would ensure that all places in the code that need to check 
repartition use a single API (and thus are consistently done)



##########
datafusion/core/src/datasource/data_source.rs:
##########
@@ -67,4 +69,33 @@ pub trait FileSource: Send + Sync {
     /// If this returns true, the DataSourceExec may repartition the data
     /// by breaking up the input files into multiple smaller groups.
     fn supports_repartition(&self, config: &FileScanConfig) -> bool;
+
+    /// If supported by the [`FileSource`], redistribute files across 
partitions according to their size.
+    /// Allows custom file formats to implement their own repartitioning logic.
+    ///
+    /// Provides a default repartitioning behavior, see comments on 
[`FileGroupPartitioner`] for more detail.

Review Comment:
   💯  for adding a link to `FileGroupPartitioner`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to