gene-bordegaray commented on code in PR #19073:
URL: https://github.com/apache/datafusion/pull/19073#discussion_r2636630791
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1599,6 +1603,25 @@ impl DefaultPhysicalPlanner {
}
}
+fn has_sufficient_rows_for_repartition(
+ input: &Arc<dyn ExecutionPlan>,
+ session_state: &SessionState,
+) -> Result<bool> {
+ // Get partition statistics, default to repartitioning if unavailable
+ let stats = match input.partition_statistics(None) {
+ Ok(s) => s,
+ Err(_) => return Ok(true),
+ };
+
+ if let Some(num_rows) = stats.num_rows.get_value().copied() {
Review Comment:
Throwing my two cents in here. I think this configuration would be great as
letting users "turn knobs" is a great way for extensibility in datafusion and
have experiemented it with myself.
I see a use for this configuration in my work and I think this fallback
behavior should not exist with the min_size configuration. As a user I prefer
if I turn a knob to say declare a min_size that it sticks to this behavior
without this fallback behavior.
Let me know your thoughts on this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]