berkaysynnada commented on code in PR #15563: URL: https://github.com/apache/datafusion/pull/15563#discussion_r2033192038
########## datafusion/physical-plan/src/sorts/sort.rs: ########## @@ -1066,22 +1071,33 @@ impl SortExec { } /// This function creates the cache object that stores the plan properties such as schema, equivalence properties, ordering, partitioning, etc. + /// It also returns the common sort prefix between the input and the sort expressions. fn compute_properties( input: &Arc<dyn ExecutionPlan>, sort_exprs: LexOrdering, preserve_partitioning: bool, - ) -> PlanProperties { + ) -> (PlanProperties, LexOrdering) { // Determine execution mode: let requirement = LexRequirement::from(sort_exprs); - let sort_satisfied = input + + let (sort_prefix, sort_satisfied) = input .equivalence_properties() - .ordering_satisfy_requirement(&requirement); + .extract_matching_prefix(&requirement); + + let sort_partially_satisfied = sort_satisfied || !sort_prefix.is_empty(); // The emission type depends on whether the input is already sorted: - // - If already sorted, we can emit results in the same way as the input + // - If already fully sorted, we can emit results in the same way as the input + // - If partially sorted, we might be able to emit results incrementally, but it is not guaranteed (Both) // - If not sorted, we must wait until all data is processed to emit results (Final) let emission_type = if sort_satisfied { input.pipeline_behavior() + } else if sort_partially_satisfied { + if input.pipeline_behavior() == EmissionType::Incremental { + EmissionType::Both Review Comment: That risk is always there. For example aggregation groups might never close, but we don't mark it as "Both". Both is currently used for Joins, when there is an exact accumulation inside of the operator, and that accumulated data can only be output when the input is consumed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org