alamb commented on code in PR #11218:
URL: https://github.com/apache/datafusion/pull/11218#discussion_r1683325423


##########
datafusion/physical-plan/src/spill.rs:
##########
@@ -85,3 +85,104 @@ fn read_spill(sender: Sender<Result<RecordBatch>>, path: 
&Path) -> Result<()> {
     }
     Ok(())
 }
+
+/// Spill the `RecordBatch` to disk as smaller batches
+/// split by `batch_size_rows`
+/// Return `total_rows` what is spilled
+pub fn spill_record_batch_by_size(

Review Comment:
   I think it may be left over from an earlier version of this PR



##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -565,7 +583,7 @@ impl StreamedBatch {
 #[derive(Debug)]
 struct BufferedBatch {
     /// The buffered record batch
-    pub batch: RecordBatch,
+    pub batch: Option<RecordBatch>,

Review Comment:
   While reviewing this PR, I found having to reason about what the valid 
`batch` or `spill_file` combinations was confusing (like there is an invariant 
I think that they can't both be Some)
   
   Rather than use two fields, I tried making an enum that encoded the state 
and I thought it was easier to reason about. Here is a proposal here: 
https://github.com/comphead/arrow-datafusion/pull/297#



##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -858,6 +889,57 @@ impl SMJStream {
         }
     }
 
+    fn free_reservation(&mut self, buffered_batch: BufferedBatch) -> 
Result<()> {
+        // Shrink memory usage for in memory batches only
+        if buffered_batch.spill_file.is_none() && 
buffered_batch.batch.is_some() {
+            self.reservation
+                .try_shrink(buffered_batch.size_estimation)?;
+        }

Review Comment:
   I think those cases are not possible but the current code doesn't make that 
clear
   
   Here is a proposal that I think makes it clearer what states are possible: 
https://github.com/comphead/arrow-datafusion/pull/297



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to