alamb commented on issue #15583:
URL: https://github.com/apache/datafusion/issues/15583#issuecomment-2780659689

   We had an earlier version of this feature contributed by @gruuya 
   
   The biggest challenge is finding suitable machines to run the benchmarks on 
-- github runners (what run normal CI scripts) are shared VMs with wildly 
varying performance and so the numbers are even more variable
   
   What I suggest we focus on is a set of scripts that people can run and 
report results to PRs (thus punting the question of what machines actually run 
the results). 
   
   the `bench.sh` file in the datafusion repo already handles running 
benchmarks pretty well
   
   What is not yet automated is 
   1. comparing results between a branch and main (aka comparing the 
performance of the branch)
   2. Triggering this comparison
   
   For 1, my script here compares a branch and main: 
https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh 
but it is not general purpose and makes a bunch of assumptions about the 
machine on
   
   For 2, I have now some sort of shell script work queue that is probably not 
worth replicatng
   
   
   
   So what I would suggest is:
   1.  create a script, `datafusion/benchmarks/compare_pr.sh`, similar to 
https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh 
that runs the benchmarks and reports results to github
   
   Then we can figure out how we want to be triggering it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to