alamb opened a new issue, #15664: URL: https://github.com/apache/datafusion/issues/15664
### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/15271 There are many interesting ideas on how to improve DataFusion while spilling for example https://github.com/apache/datafusion/issues/15271 from @2010YOUY01 and others. What I think we really need next to make progress in this area is a benchmark / agreed upon way of measuring our progress so that we can improve and ### Describe the solution you'd like I would like a documented command / set of commands that is: 1. Easy to run (and thus fast to test / iterate on) 2. Exercises the spilling feature at different levels of memory pressure 3. Spends most of its time sorting/spilling/merging (not generating output for example) ### Describe alternatives you've considered idea 1: can use some `datafusion-cli` features / flags and document them Idea 2: Add a new suite to bench.sh / `dfbench`: https://github.com/apache/datafusion/tree/main/benchmarks As for what to do I suggest something relatively simple like sorting the TPCH lineitem table with 200MB, 500MB, 1GB, 5GB and 10GB of memory for example ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org