Thanks all. Just to add a bit of note,
> * Create a wiki page collecting all the practices to reduce the hours > (using the pr cancel workflow discussed earlier + timeouts + ...?) We should probably also mention that Apache Spark managed to distribute the workflow runs to forked repositories in pull requests, see the PRs: - https://github.com/apache/spark/pull/32092 - https://github.com/apache/spark/pull/32193 and umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-35119 This is still a workaround but it managed to reduce the overhead significantly by leveraging the resources from forked repositories. 2021년 4월 19일 (월) 오전 12:41, Antoine Pitrou <anto...@python.org>님이 작성: > > Hi Marton, > > Thanks a lot for the information you have collected and presented. This > is very insightful! > > Le 18/04/2021 à 11:06, Elek, Marton a écrit : > > > > There are signs of mis-configuation of some jobs. For example in some > > projects I found many failure jobs with >15 hours executions even if the > > slowest successful (!) execution took only a few hours. It clearly shows > > that job level timeout is not yet configured. > > Ok, I'm curious: according to the GHA docs, the default job > timeout is 6 hours (360 minutes): > > https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes > > In Arrow, we didn't change this setting... how come your stats show > jobs taking up to 24 hours? > > Apparently, what's named "jobhours" in your statistics is actually the > runtime for an entire workflow (the sum of all job runtimes for that > workflow). That's at least what I conclude if I look at this workflow, > which your table lists as the longest Arrow "job" with 24 hours of > runtime: https://github.com/apache/arrow/actions/runs/699123317 > None of the jobs in that workflow took more than 6 hours, but cumulated > they indeed end up around 24 hours... (because 4 jobs timed out at 6 hours) > > > Also the 46 or 36 hours of max job execution time sounds very > > un-realistic (it's a job, not the full workflow). > > Well, according to the above it's the full workflow. It's still > unexpected as far as Arrow is concerned, though, and we should implement > per-job timeouts reflecting our expectations. > > > My suggestion: > > > > * Publish Github action usage in a central place which is clearly > > visible for all Apache projects (I would be happy to volunteer here) > > > > * Identify official suggestion of fair-usage (monthly hours) per > > project (easiest way: available hours / projects using github actions) > > > > * Create a wiki page collecting all the practices to reduce the hours > > (using the pr cancel workflow discussed earlier + timeouts + ...?) > > > > * After every month send a very polite reminder to the projects who > > overuses github actions (using dev lists) including detailed statistics > > and the wiki link to help them to improve/reduce the usage. > > As a member of the Arrow PMC, I say +1 to all of this. > > Best regards > > Antoine. >