Thanks all.

Just to add a bit of note,

>  * Create a wiki page collecting all the practices to reduce the hours
> (using the pr cancel workflow discussed earlier + timeouts + ...?)

We should probably also mention that Apache Spark managed to distribute the
workflow runs to forked repositories in pull requests, see the PRs:
- https://github.com/apache/spark/pull/32092
- https://github.com/apache/spark/pull/32193
and umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-35119

This is still a workaround but it managed to reduce the overhead
significantly by leveraging the resources from forked repositories.


2021년 4월 19일 (월) 오전 12:41, Antoine Pitrou <anto...@python.org>님이 작성:

>
> Hi Marton,
>
> Thanks a lot for the information you have collected and presented.  This
> is very insightful!
>
> Le 18/04/2021 à 11:06, Elek, Marton a écrit :
> >
> > There are signs of mis-configuation of some jobs. For example in some
> > projects I found many failure jobs with >15 hours executions even if the
> > slowest successful (!) execution took only a few hours. It clearly shows
> > that job level timeout is not yet configured.
>
> Ok, I'm curious: according to the GHA docs, the default job
> timeout is 6 hours (360 minutes):
>
> https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes
>
> In Arrow, we didn't change this setting... how come your stats show
> jobs taking up to 24 hours?
>
> Apparently, what's named "jobhours" in your statistics is actually the
> runtime for an entire workflow (the sum of all job runtimes for that
> workflow).  That's at least what I conclude if I look at this workflow,
> which your table lists as the longest Arrow "job" with 24 hours of
> runtime: https://github.com/apache/arrow/actions/runs/699123317
> None of the jobs in that workflow took more than 6 hours, but cumulated
> they indeed end up around 24 hours... (because 4 jobs timed out at 6 hours)
>
> > Also the 46 or 36 hours of max job execution time sounds very
> > un-realistic (it's a job, not the full workflow).
>
> Well, according to the above it's the full workflow.  It's still
> unexpected as far as Arrow is concerned, though, and we should implement
> per-job timeouts reflecting our expectations.
>
> > My suggestion:
> >
> >    * Publish Github action usage in a central place which is clearly
> > visible for all Apache projects (I would be happy to volunteer here)
> >
> >    * Identify official suggestion of fair-usage (monthly hours) per
> > project (easiest way: available hours / projects using github actions)
> >
> >    * Create a wiki page collecting all the practices to reduce the hours
> > (using the pr cancel workflow discussed earlier + timeouts + ...?)
> >
> > * After every month send a very polite reminder to the projects who
> > overuses github actions (using dev lists) including detailed statistics
> > and the wiki link to help them to improve/reduce the usage.
>
> As a member of the Arrow PMC, I say +1 to all of this.
>
> Best regards
>
> Antoine.
>

Reply via email to