I am also very sad that our CI is so slow. After the obvious technical problems, the fact that the project is developing very slowly is also a problem for the health of the community.
Among active members of the community, this causes reluctance to continue working. I also fell victim and recently started to be active in other projects that don't use Apache infrastructure as my work in Apache recently focused on CI issues. I have to constantly fight with aggressive CI optimizations, or simply with its instability. This is not my passion and I prefer to focus on other tasks, and other projects allow me to do so. For new contributors, it's also a very unpleasant first experience. Many new contributors have problems with aggressive CI optimizations and CI instability and feel they have made some errors in their contribution. When I check later, it turns out that their code is fine, but the problem is the infrastructure on which these tests were run. After a few tries, some people feel so discouraged that they do not contribute further. I would be happy if we could get a tip on what we can do next to improve our CI as well as keep our community happy. Is there any solution that complies with the Apache policy and is more stable? In Airflow, we've already tried several solutions, starting with the Travis CI, then trying to use GitLab Ci, and now we're trying to use Github Action. Each solution has some kinds of problems and prevents us from working freely. Does anyone have a hint on what we can use now? We even have the funds to run it on our own servers, but we would like to know what product we can use to make our CI stable and that we don't have to migrate again. For now, our most thought is about using self-hosted runners for Github Action, but we are concerned about using them for security reasons. Can we contact Github Action to discuss this topic? Does ASF have contact with the Account Manager? Can someone else forward our messages? On 2021/01/08 19:09:46, Jarek Potiuk <j...@polidea.com> wrote: > Hello everyone (Gavin, Sander especially),> > > Over the last few days again the queue for GA got completely blocked. We> > have 2-4 jobs in parallel max and our speed of merging PRs dropped to 1 per> > 4-5 hours.> > > We really need to find out to solve the problem together with Github> > account that we were supposed to meet because it will only get worse.> > > My colleague waits (> > https://github.com/apache/airflow/pull/13409#issuecomment-756364484) with> > 5 PRs for my PR to be merged,> > > I submitted it (again) yesterday morning only to find out in the evening> > that it failed in the middle. This morning I fixed it (I hope) and> > submitted it in the morning and it's 8 pm afternoon and till now still I am> > at 2/3 of it (30 out of 50 checks green). it usually takes up to 30> > minutes to complete.> > > There are two things that probably with INFRA involvement things could be> > improved:> > > 1) I heavily optimized our setup. I literally run out of optimization> > ideas yesterday.> > > 2) We secured our funds for self-hosted runners, however> > > 3) We still cannot use self-hosted runners due to> > https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories> > .> > > Ash from Airflow even prepared a PR:> > https://github.com/actions/runner/pull/783 that should allow us to mitigate> > the security problem with self-hosted runners for public repos, but we have> > not since anything since 2nd of November.> > > 4) we are working (Ash again) on using the PR to make a fork of the runner> > and set it up regardless of the approval of GH Actions team, but we are not> > sure how secure and robust it will be> > https://github.com/ashb/runner/commit/448341ee47c123f0d3d56c0bb1be9d292fc646ee> > because we have to basically very quickly automatically rebase our changes> > on top of new releases from GitHub. This is madness and it will cost us a> > lot of engineering and maintenance time.> > > 5) We (Tobiasz from the Airflow team) even developed this (far from> > perfect) dashboard that gathers information about the number of GA> > workflows in/progress/queued per project and they clearly show the> > situation is getting worse by day:> > > https://pasteboard.co/JIJa5Xg.png> > > 6) I opened 18 tickets to Github support and pretty much all of them are> > either recurring or we found a way to mitigate them:> > > https://pasteboard.co/JIJbIC9.png> > > 7) I do not even mention the two critical security issues are opened for> > Github Actions resulting from the Xmas incident (they are raised through> > bounty.github.com and wait for acknowledgment till today.> > > We are pretty much stuck and there is no viable option it seems. I, again,> > literally run out of ideas what we can do. Seems that at least the> > "self-hosted security problem" is something that could be addressed without> > a heavy investment of either INFRA or GitHub, but we have no leverage on> > them.> > > Is there something we can do via our Github account? We were supposed to> > get meeting with them but it got cancelled.> > > Can we at least organize the meeting and urge them to fix the security> > problem for public self -hosted repositories?> > > This is not a complaint, this is just crying for HELP ... We are terribly> > stuck.> > > > J,> > > > -- > > > Jarek Potiuk> > Polidea <https://www.polidea.com/> | Principal Software Engineer> > > M: +48 660 796 129 <+48660796129>> > [image: Polidea] <https://www.polidea.com/>> >