ding-young opened a new issue, #17340:
URL: https://github.com/apache/datafusion/issues/17340

   ### Describe the bug
   
   This issue was observed in https://github.com/apache/datafusion/pull/17029,  
where the memory size of a RecordBatch after reading from spill (via Arrow IPC) 
is significantly larger than the size recorded before spilling.
   
   While some increase is expected due to additional metadata or encoding 
during IPC write, in many cases the difference is much larger than expected. We 
should investigate where this memory growth comes from and try to minimize the 
discrepancy as much as possible since we rely on the maximum memory size 
recorded at the time of spilling to determine how many spilled files can be 
read back at once.
   
   ### To Reproduce
   
   Run `cargo test -p datafusion memory_limit::test_stringview_external_sort -- 
--exact --nocapture` in above related PR. 
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   One cause of it was incorrect memory accounting for `StringViewArray`. 
However, even after that fix (https://github.com/apache/datafusion/pull/17315) 
, validation still fails. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to