alamb commented on code in PR #14754: URL: https://github.com/apache/datafusion/pull/14754#discussion_r1963520316
########## datafusion/core/src/datasource/data_source.rs: ########## @@ -67,4 +69,33 @@ pub trait FileSource: Send + Sync { /// If this returns true, the DataSourceExec may repartition the data /// by breaking up the input files into multiple smaller groups. fn supports_repartition(&self, config: &FileScanConfig) -> bool; Review Comment: It seems like we could remove `supports_repartition` as well: 1. This API hasn't yet been released so it wouldn't be a breaking change 2. It would ensure that all places in the code that need to check repartition use a single API (and thus are consistently done) ########## datafusion/core/src/datasource/data_source.rs: ########## @@ -67,4 +69,33 @@ pub trait FileSource: Send + Sync { /// If this returns true, the DataSourceExec may repartition the data /// by breaking up the input files into multiple smaller groups. fn supports_repartition(&self, config: &FileScanConfig) -> bool; + + /// If supported by the [`FileSource`], redistribute files across partitions according to their size. + /// Allows custom file formats to implement their own repartitioning logic. + /// + /// Provides a default repartitioning behavior, see comments on [`FileGroupPartitioner`] for more detail. Review Comment: 💯 for adding a link to `FileGroupPartitioner` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org