ding-young opened a new issue, #17340: URL: https://github.com/apache/datafusion/issues/17340
### Describe the bug This issue was observed in https://github.com/apache/datafusion/pull/17029, where the memory size of a RecordBatch after reading from spill (via Arrow IPC) is significantly larger than the size recorded before spilling. While some increase is expected due to additional metadata or encoding during IPC write, in many cases the difference is much larger than expected. We should investigate where this memory growth comes from and try to minimize the discrepancy as much as possible since we rely on the maximum memory size recorded at the time of spilling to determine how many spilled files can be read back at once. ### To Reproduce Run `cargo test -p datafusion memory_limit::test_stringview_external_sort -- --exact --nocapture` in above related PR. ### Expected behavior _No response_ ### Additional context One cause of it was incorrect memory accounting for `StringViewArray`. However, even after that fix (https://github.com/apache/datafusion/pull/17315) , validation still fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org