Kontinuation commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2656400964
I have also encountered the same problem with string views. DataFusion uses `interleave` function to produce merged batches, and `interleave` tends to produce batches that has super large size due to https://github.com/apache/arrow-rs/pull/6779. Although it simply references to the data buffers of interleaved arrays so it does not actually take extra memory space, but it makes the result of `get_record_batch_memory_size(batch)` or `batch.get_array_memory_size()` super large, and it is likely to cause memory reservation failures. When spilling happens, these interleaved arrays will be serialized using Arrow IPC and produces very large binaries. When we read them back in spill-read phase, we have to allocate super large buffers for these arrays, which makes things much worse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org