kosiew commented on code in PR #23189:
URL: https://github.com/apache/datafusion/pull/23189#discussion_r3522926868
##########
datafusion/datasource/src/file_scan_config/mod.rs:
##########
@@ -204,17 +204,6 @@ pub struct FileScanConfig {
/// would be incorrect if there are filters being applied, thus this
should be accessed
/// via [`FileScanConfig::statistics`].
pub(crate) statistics: Statistics,
- /// When true, file_groups are organized by partition column values
- /// and output_partitioning will return Hash partitioning on partition
columns.
- /// This allows the optimizer to skip hash repartitioning for aggregates
and joins
- /// on partition columns.
- ///
- /// If the number of file partitions > target_partitions, the file
partitions will be grouped
- /// in a round-robin fashion such that number of file partitions =
target_partitions.
- ///
- /// Follow-up: remove this redundant field in favor of
- /// `output_partitioning`, see
<https://github.com/apache/datafusion/issues/23099>.
- pub partitioned_by_file_group: bool,
Review Comment:
Thanks for clarifying the intended direction here. I agree that
`output_partitioning` should stay the canonical representation, and the legacy
protobuf read fallback covers the wire compatibility concern I raised earlier.
The remaining issue is release-facing documentation. Since removing
`FileScanConfig::partitioned_by_file_group` is a breaking Rust API change,
could you please document the migration in the version-specific upgrade guide?
The guide should point users from `partitioned_by_file_group` to
`output_partitioning`.
##########
datafusion/datasource/src/file_scan_config/mod.rs:
##########
@@ -519,18 +506,6 @@ impl FileScanConfigBuilder {
self
}
- /// Set whether file groups are organized by partition column values.
- ///
- /// When set to true, the output partitioning will be declared as Hash
partitioning
- /// on the partition columns.
- pub fn with_partitioned_by_file_group(
Review Comment:
Same request for the builder API removal. Since
`with_partitioned_by_file_group(...)` is going away, downstream users need a
clear upgrade note.
Could you please document the replacement as
`with_output_partitioning(...)`, including the expected pattern of building
`Partitioning::Hash` over the table partition columns and passing it through
`FileScanConfigBuilder::with_output_partitioning(Some(...))`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]