Also some comments for the stats. This is good stuff Marton.
> Apparently, what's named "jobhours" in your statistics is actually the > runtime for an entire workflow (the sum of all job runtimes for that > workflow). That's at least what I conclude if I look at this workflow, > which your table lists as the longest Arrow "job" with 24 hours of > runtime: https://github.com/apache/arrow/actions/runs/699123317 > None of the jobs in that workflow took more than 6 hours, but cumulated > they indeed end up around 24 hours... (because 4 jobs timed out at 6 hours) > It does look like you have workflows rather than jobs - we had very similar problems when we (Tobiasz - one of the Airflow contributors) tried to get the stats. The REST API limitations are super-painful, there is no way to dig down to the job level (there is no GraphQL version to do it efficiently unfortunately). We found that rather than looking at jobhours, it's much better to look at "in-progress" and "queued" workflow from each project. It gives a much better overview of what's going on. Together with Gavin and the infra team we passed the request to Github to get maybe some extracts of the stats, but until we have it we have a "poor-man's" extracts that we regularly run and store in Google Bigquery and simple DataStudio report showing it (unfortunately we cannot share it with everyone as it will incur some costs if it is publicly used). but we try to keep screenshots updated in this doc - where I keep status of the current GA integration with ASF infra: https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status Here are some latest screenshots:: April stats: https://ibb.co/mCL6kZh March and April stats: https://ibb.co/r2zjNsV Those two above will show you the variability. Just some summary for those who do not like to watch the graphs: Seems like pulsar got down quite a bit in March/April where Arrow started to be the one that uses most jobs, Spark being on the second place (but with the changes from Hyukjin it will go down soon I believe). In the meantime apisix-dashboard seems on the rise and pulsar is getting back. Here you can see the peaks in a number of workflows: https://ibb.co/QCJdLGD But this one is the most important: the number of ASF projects using GA since November: https://ibb.co/RpFyQQy The last one is most interesting, because as I see it, none of the proposals below will work - they might temporarily help if some projects will optimize it but there will be new ones coming. It seems that since November we are continuously fighting for jobs in peak and various projects that got fed-up with it, finding some workarounds or moving elsewhere. And it will continue. > > * Publish Github action usage in a central place which is clearly > > visible for all Apache projects (I would be happy to volunteer here) > Oh yeah. If we only can get good stats, that would be great, but with the current API limitations that seems very difficult. But If you could do that it would be great - however we need peak hours stats and peak hours limits to be precise. > > * Identify official suggestion of fair-usage (monthly hours) per > > project (easiest way: available hours / projects using github actions) > The problem is that with the fixed amount of jobs we have and more projects coming AND the fact that we have problems in Peaks, this stats is a) wrong (the build hours do not matter too much - the peak hours do). b) will continue to trend downwards with more projects coming. And it's the peak hours we need to limit not overall hours. And the problem is that peak hours usage is out-of-control by the projects themselves. The problem is that those peak hours mostly come from Contributors contributing new PRs. There is not much each project can do to reduce those. It's not only best practices, cancelling etc. But the main contributor is hbw many PRs are raised within a time window. And there isn't much we can do - other than give everyone their own lane (and I mean every contributor really - this is what Hyukjin did). No matter how hard the projects will try, this can't be really controlled otherwise. > > > > * Create a wiki page collecting all the practices to reduce the hours > > (using the pr cancel workflow discussed earlier + timeouts + ...?) It's there: https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status Good start. We can continue improving it. > > > > * After every month send a very polite reminder to the projects who > > overuses github actions (using dev lists) including detailed statistics > > and the wiki link to help them to improve/reduce the usage. Having good stats is a good starting point for that. But there is only so much we can do and with the current growth of usage this is mostly about deferring the inevitable by couple of weeks/months even if everyone implements all optimisations. I think distribution of "build-hours" per-committer is really the only sustainable long-term way. -- +48 660 796 129