[PR] chore(rust/sedona-spatial-join): More accurate batch in-memory size estimation [sedona-db]

via GitHub Wed, 14 Jan 2026 07:29:27 -0800


Kontinuation opened a new pull request, #515:
URL: https://github.com/apache/sedona-db/pull/515


   This patch improves the accuracy of memory usage estimation by implementing 
our own functions for estimating the in-memory sizes of record batches and 
arrow arrays.
   
   The rationale is similar to https://github.com/apache/datafusion/pull/13377. 
If we don't roll our own memory usage estimation function but call 
`RecordBatch::get_array_memory_size` instead, we'll get insanely inaccurate 
numbers for spilled batches read using `arrow::ipc::reader::StreamReader`.
   
   Future work: use the memory pool API of arrow-rs for more accurate memory 
usage accounting. See https://github.com/apache/arrow-rs/issues/8137.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] chore(rust/sedona-spatial-join): More accurate batch in-memory size estimation [sedona-db]

Reply via email to