westonpace commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2573542512
Here's a pure-rust datafusion-only example: https://github.com/westonpace/arrow-datafusion/commit/26ed75c51ad649a274063ad3fa1262b7025a17cf It takes a bit of time the first run to generate the strings test file (it probably doesn't need to be so big). After that it reproduces the issue quickly. I've also added some prints that hopefully highlight the issue. Before we do an in-memory sort we have ~5MB of unsorted string data. After sorting we have 8MB of sorted string data. This is not surprising to me. During the sort we are probably building a string array and probably using some kind of resize-on-append string building that is doubling and we end up with ~8MB because the amount we need is between 4MB and 8MB. Unfortunately, this leads to a failure which is probably should not do. I think @alamb had some good suggestions [in this comment](https://github.com/apache/datafusion/issues/10073#issuecomment-2056571501) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org