alamb commented on code in PR #15355:
URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009075716


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -230,9 +219,14 @@ struct ExternalSorter {
     /// if `Self::in_mem_batches` are sorted
     in_mem_batches_sorted: bool,
 
-    /// If data has previously been spilled, the locations of the
-    /// spill files (in Arrow IPC format)
-    spills: Vec<RefCountedTempFile>,
+    /// During external sorting, in-memory intermediate data will be appended 
to
+    /// this file incrementally. Once finished, this file will be moved to 
[`Self::finished_spill_files`].
+    in_progress_spill_file: Option<InProgressSpillFile>,
+    /// If data has previously been spilled, the locations of the spill files 
(in
+    /// Arrow IPC format)
+    /// Within the same spill file, the data might be chunked into multiple 
batches,
+    /// and ordered by sort keys.
+    finished_spill_files: Vec<RefCountedTempFile>,

Review Comment:
   The different semantics for different operations makes sense to me
   
   I was thinking more mechnically, like just storing the 
Vec<RefCountedTempFile>` as a field on `SortManager` and allowing Sort and 
Hash, etc to access / manipulate it as required. I think it is fine to consider 
this in a future PR as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to