westonpace commented on issue #10073:
URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2573542512

   Here's a pure-rust datafusion-only example: 
https://github.com/westonpace/arrow-datafusion/commit/26ed75c51ad649a274063ad3fa1262b7025a17cf
   
   It takes a bit of time the first run to generate the strings test file (it 
probably doesn't need to be so big).  After that it reproduces the issue 
quickly.
   
   I've also added some prints that hopefully highlight the issue.  Before we 
do an in-memory sort we have ~5MB of unsorted string data.  After sorting we 
have 8MB of sorted string data.
   
   This is not surprising to me.  During the sort we are probably building a 
string array and probably using some kind of resize-on-append string building 
that is doubling and we end up with ~8MB because the amount we need is between 
4MB and 8MB.
   
   Unfortunately, this leads to a failure which is probably should not do.  I 
think @alamb had some good suggestions [in this 
comment](https://github.com/apache/datafusion/issues/10073#issuecomment-2056571501)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to