Re: [I] External sorting not working for (maybe only for string columns??) [datafusion]

via GitHub Thu, 13 Feb 2025 04:07:16 -0800


Kontinuation commented on issue #12136:
URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2656400964


   I have also encountered the same problem with string views.
   
   DataFusion uses `interleave` function to produce merged batches, and 
`interleave` tends to produce batches that has super large size due to 
https://github.com/apache/arrow-rs/pull/6779. Although it simply references to 
the data buffers of interleaved arrays so it does not actually take extra 
memory space, but it makes the result of `get_record_batch_memory_size(batch)` 
or `batch.get_array_memory_size()` super large, and it is likely to cause 
memory reservation failures.
   
   When spilling happens, these interleaved arrays will be serialized using 
Arrow IPC and produces very large binaries. When we read them back in 
spill-read phase, we have to allocate super large buffers for these arrays, 
which makes things much worse.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] External sorting not working for (maybe only for string columns??) [datafusion]

Reply via email to