Kontinuation opened a new pull request, #515: URL: https://github.com/apache/sedona-db/pull/515
This patch improves the accuracy of memory usage estimation by implementing our own functions for estimating the in-memory sizes of record batches and arrow arrays. The rationale is similar to https://github.com/apache/datafusion/pull/13377. If we don't roll our own memory usage estimation function but call `RecordBatch::get_array_memory_size` instead, we'll get insanely inaccurate numbers for spilled batches read using `arrow::ipc::reader::StreamReader`. Future work: use the memory pool API of arrow-rs for more accurate memory usage accounting. See https://github.com/apache/arrow-rs/issues/8137. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
