alamb commented on issue #15583: URL: https://github.com/apache/datafusion/issues/15583#issuecomment-2780659689
We had an earlier version of this feature contributed by @gruuya The biggest challenge is finding suitable machines to run the benchmarks on -- github runners (what run normal CI scripts) are shared VMs with wildly varying performance and so the numbers are even more variable What I suggest we focus on is a set of scripts that people can run and report results to PRs (thus punting the question of what machines actually run the results). the `bench.sh` file in the datafusion repo already handles running benchmarks pretty well What is not yet automated is 1. comparing results between a branch and main (aka comparing the performance of the branch) 2. Triggering this comparison For 1, my script here compares a branch and main: https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh but it is not general purpose and makes a bunch of assumptions about the machine on For 2, I have now some sort of shell script work queue that is probably not worth replicatng So what I would suggest is: 1. create a script, `datafusion/benchmarks/compare_pr.sh`, similar to https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh that runs the benchmarks and reports results to github Then we can figure out how we want to be triggering it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org