alamb commented on code in PR #15469:
URL: https://github.com/apache/datafusion/pull/15469#discussion_r2020125627


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -397,16 +393,13 @@ impl ExternalSorter {
         self.metrics.spill_metrics.spill_file_count.value()
     }
 
-    /// When calling, all `in_mem_batches` must be sorted (*), and then all of 
them will
-    /// be appended to the in-progress spill file.
-    ///
-    /// (*) 'Sorted' here means globally sorted for all buffered batches when 
the
-    /// memory limit is reached, instead of partially sorted within the batch.
-    async fn spill_append(&mut self) -> Result<()> {
-        assert!(self.in_mem_batches_sorted);
-
-        // we could always get a chance to free some memory as long as we are 
holding some
-        if self.in_mem_batches.is_empty() {
+    /// Appending globally sorted batches to the in-progress spill file, and 
clears
+    /// the `globally_sorted_batches` (also its memory reservation) afterwards.
+    async fn consume_and_spill_append(
+        &mut self,
+        globally_sorted_batches: &mut Vec<RecordBatch>,

Review Comment:
   thank you for the name here -- much better



##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -216,10 +216,8 @@ struct ExternalSorter {
     // STATE BUFFERS:
     // Fields that hold intermediate data during sorting
     // ========================================================================
-    /// Potentially unsorted in memory buffer
+    /// Unsorted input batches stored in the memory buffer

Review Comment:
   Removing this field is an improvement 👍  thanks @2010YOUY01 and @comphead 
   
   Given there are still three fields whose relationship is tied, I wonder if 
it would improve the code if we encoded that relationship in an actual rust 
`enum` -- for example
   
   ```rust
   enum SortState {
     /// All data is in memory
     Memory {
       batches: Vec<RecordBatch>,
     }, 
     /// intermediate data is spilling to disk
     Spilling {
       batches: Vec<RecordBatch>,
       in_progress_spill_file: InProgressSpillFile,
     }
   ...
   }
   ```
   
   🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to