Hi Marton,

Thanks a lot for the information you have collected and presented. This is very insightful!

Le 18/04/2021 à 11:06, Elek, Marton a écrit :

There are signs of mis-configuation of some jobs. For example in some
projects I found many failure jobs with >15 hours executions even if the
slowest successful (!) execution took only a few hours. It clearly shows
that job level timeout is not yet configured.

Ok, I'm curious: according to the GHA docs, the default job
timeout is 6 hours (360 minutes):
https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes

In Arrow, we didn't change this setting... how come your stats show
jobs taking up to 24 hours?

Apparently, what's named "jobhours" in your statistics is actually the runtime for an entire workflow (the sum of all job runtimes for that workflow). That's at least what I conclude if I look at this workflow, which your table lists as the longest Arrow "job" with 24 hours of runtime: https://github.com/apache/arrow/actions/runs/699123317 None of the jobs in that workflow took more than 6 hours, but cumulated they indeed end up around 24 hours... (because 4 jobs timed out at 6 hours)

Also the 46 or 36 hours of max job execution time sounds very
un-realistic (it's a job, not the full workflow).

Well, according to the above it's the full workflow. It's still unexpected as far as Arrow is concerned, though, and we should implement per-job timeouts reflecting our expectations.

My suggestion:

   * Publish Github action usage in a central place which is clearly
visible for all Apache projects (I would be happy to volunteer here)

   * Identify official suggestion of fair-usage (monthly hours) per
project (easiest way: available hours / projects using github actions)

   * Create a wiki page collecting all the practices to reduce the hours
(using the pr cancel workflow discussed earlier + timeouts + ...?)

* After every month send a very polite reminder to the projects who
overuses github actions (using dev lists) including detailed statistics
and the wiki link to help them to improve/reduce the usage.

As a member of the Arrow PMC, I say +1 to all of this.

Best regards

Antoine.

Reply via email to