Hi!

The llvm rpm (https://src.fedoraproject.org/rpms/llvm) has recently been
struggling with Fedora CI, by which I mean the CI that produces scratch
builds and runs dist-git tests on merge requests.

The llvm rpm has a combination of long build times (typically 3-8 hours on
koji, depending on arch) combined with many merge requests, which is where
things break down. We've received multiple complaints that scratch builds
for our MRs occasionally end up clogging s390x koji, due to a number of
problems with Fedora CI.

 * Both Zuul and Fedora CI produce their own independent scratch builds,
increasing load by 2x. I think this is tracked as part of
https://pagure.io/fedora-ci/general/issue/476. This is the only problem we
were able to address ourselves, by disabling Zuul.

* Fedora CI does not cancel old scratch builds when a new commit is pushed
or the MR is rebased (https://pagure.io/fedora-ci/general/issue/493). This
means that if some changes are pushed in response to MR feedback, you end
up with an extra set of scratch builds running in parallel. This is further
exacerbated by Pagure not having proper support for rebase merges, so if
you hit Rebase and then Merge you also get a bonus scratch build. (I'm not
sure whether Zuul properly cancels scratch builds, or whether it produces
zombies as well.)

* Fedora CI has no configurability. For example, we can't disable just the
s390x scratch builds (https://pagure.io/fedora-ci/general/issue/494), which
tend to be more than twice as slow as other builds.

* As far as I know, it's not even possible to disable Fedora CI entirely to
e.g. only use Zuul instead. Similarly, we can't stop automatically
triggering Fedora CI and requiring manual [citest] instead.

* For scratch builds longer than 4 hours, Fedora CI will never report back
the result (https://pagure.io/fedora-ci/general/issue/485), even though the
scratch build continues running. It will stay in the pending state forever.
For llvm all scratch builds take more than 4 hours, so we never get
results. This also means that dist-git tests never run. I submitted a PR to
raise this timeout (
https://github.com/fedora-ci/dist-git-build-pipeline/pull/41) but wasn't
able to get a response.

It's not really necessary to solve *all* of these problems -- I think the
MVP to make MRs usable for llvm without negatively affecting other people
would probably be to increase the timeout and either a) implement
auto-cancellation for scratch builds or b) allow preventing auto-start of
CI, requiring manual [citest]. (Naively, I assume the latter is easier to
implement.)

However, I haven't been able to get any response from maintainers on Fedora
CI issues or PRs, so I'm not really sure what to do here anymore, thus this
mail to fedora-devel. I'd appreciate any pointers on how to move forward.

Regards,
Nikita
-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to