On Thu, 13 Feb 2025 at 14:53, Neal Gompa <ngomp...@gmail.com> wrote:

> On Thu, Feb 13, 2025 at 8:32 AM Clement Verna <cve...@fedoraproject.org>
> wrote:
> >
> >
> >
> > On Thu, 13 Feb 2025 at 13:04, Neal Gompa <ngomp...@gmail.com> wrote:
> >>
> >> On Thu, Feb 13, 2025 at 5:46 AM Clement Verna <cve...@fedoraproject.org>
> wrote:
> >> >
> >> >
> >> >
> >> > On Thu, 13 Feb 2025 at 10:13, Dan Horák <d...@danny.cz> wrote:
> >> >>
> >> >> On Thu, 13 Feb 2025 09:32:06 +0100
> >> >> Clement Verna <cve...@fedoraproject.org> wrote:
> >> >>
> >> >> > cross posting from
> >> >> >
> https://discussion.fedoraproject.org/t/gating-fedora-updates-on-fedora-coreos-ci/144566
> >> >> >
> >> >> > Hi all,
> >> >> >
> >> >> > Last year, the Fedora CoreOS working group implemented CI testing
> [1] for
> >> >> > Bodhi updates on a set of critical packages [2]. Automatic updates
> are a
> >> >> > key feature of Fedora CoreOS, and this testing helps us detect
> update
> >> >> > related issues early, improving Fedora’s update stability and
> reducing
> >> >> > troubleshooting time.
> >> >> >
> >> >> > While our long-term goal is to implement this CI testing in
> fedora-bootc
> >> >> > with Bodhi gating integration, there's still significant work
> ahead before
> >> >> > we can trigger fedora-bootc tests on Bodhi updates. It's worth
> noting that
> >> >> > many of the tests currently running in Fedora CoreOS CI are
> essentially
> >> >> > "image mode" tests rather than CoreOS-specific tests. Eventually,
> we expect
> >> >> > to migrate these tests to fedora-bootc. However, until that
> infrastructure
> >> >> > is ready, enabling gating on the FCOS suite provides immediate
> image mode
> >> >> > coverage for critical packages.
> >> >> >
> >> >> > Given our experience running these tests, we would like to propose
> making
> >> >> > the coreos.cosa.build-and-test a required gate for package updates
> in
> >> >> > rawhide. We've already been successfully gating packages owned by
> the
> >> >> > Fedora CoreOS working group [3], and we'd like to extend this
> requirement
> >> >> > to the broader package set defined here [4].
> >> >> >
> >> >> > Following is the breakdown of passed vs failed builds by package
> on over
> >> >> > 400 builds, this gives package maintainers an idea of how often an
> update
> >> >> > might be gated. It is important to note that not all test failures
> here are
> >> >> > related to the software in the proposed Bodhi update since there
> could be
> >> >> > flakes; either due to the test infra environment or due to some
> transient
> >> >> > test pipeline misconfiguration. In the case where failures are not
> related
> >> >> > to updates , it would be easy to waive the test or coordinate with
> the
> >> >> > Fedora CoreOS working group to disable the test.
> >> >>
> >> >> gating based on flaky tests or flaky infra is a no-go, sorry ... You
> >> >> should define an "acceptable false positive" rate first (1%?, 2%?),
> then
> >> >> fix tests and infra and then think about gating. Even when half of
> the
> >> >> presented failures are not caused by the package under test, it's too
> >> >> much.
> >> >
> >> >
> >> > We are starting this conversation because we have good confidence
> that we flake and infrastructure failure are a minority of cases.
> >> > Theses test are running on the Fedora Infrastructure (OpenShift
> cluster) , so not a dedicated infra that would need something special to be
> fixed and our test framework has also features to make the tests resistant
> to flakes (we are re-runing failing tests) and we are also able to easily
> snooze tests for a period of time if needed. If with all of this a critical
> update is still blocked because of a false positive, it is fairly easy to
> waive the test in Bodhi (
> https://docs.fedoraproject.org/en-US/rawhide-gating/faq/#_how_do_i_unblock_an_update
> )
> >> >
> >> > Our tests have caught regressions in the past which have landed in
> Fedora and affected Fedora users not just Fedora CoreOS users, so we
> believe that there is a lot of value in making these tests blocking.
> >> >
> >>
> >> Just because it's easy to waive them doesn't mean it's a good idea to
> >> do so.
> >
> >
> > Completely agreed, waiving is handy in case of an infrastructure issue,
> otherwise if the tests are failing it's for a good reason :-) We are
> catching a lot of real bugs which are not CoreOS specific.
> >
> >
> https://github.com/coreos/fedora-coreos-tracker/issues?q=%20label%3A%22pipeline%20failure%22%20
> >
> >
> >>
> >> Also, because Fedora CoreOS doesn't actually follow Fedora with
> >> updates (y'all have that pool and manifest thingy that lets you skip
> >> or downgrade freely), I'm not sure it actually makes sense to make it
> >> a required gate as long as you are doing that.
> >
> >
> > This is a chicken an egg problem, we have that mechanism because we want
> to protect our users from regressions. If we start gating updates on the
> FCOS CI, that mechanism would not be so useful anymore.
> >
> >>
> >>
> >> Mandatory gating exists for things where shipping it would utterly
> >> break things without releng intervention. It is not possible to get
> >> that far for Fedora CoreOS because you neither take in updates
> >> automatically, nor can users consume them easily.
> >
> >
> > It is worth noting that what we are testing is not only CoreOS specific,
> we are finding a lot of issues that affect all of Fedora. There is also
> going to be a lot of overlap with fedora-bootc, which is following a more
> traditional release and update mechanism.
> >
> >>
> >>
> >> We already have a problem with some tests being hard for update
> >> submitters to troubleshoot and resolve, I would like to not add more
> >> of it.
> >
> >
> > I am not sure there is much value in making it easy to push an update
> which is going to end up breaking some systems. I can empathize with the
> fact that maintaining packages is a cumbersome effort but we also have an
> opportunity to improve Fedora stability and catch issues earlier than we
> currently do.
> >
>
> What are you hoping to achieve by making it mandatory to pass?


Currently when a FCOS build fails we spend a lot of time to find out which
update caused the failure. Since each build contains a batch of updates it
not always trivial to pin down which update was the root cause. By moving
these tests on updates and making them required it will be much easier to
identify the root cause before it hits our build pipeline.


> Do you
> intend to be more active in assisting in resolving issues in those
> packages?

What's the commitment from CoreOS for this?


We are already doing this and it won't change, it is just making our life
easier by identifying the issues earlier and avoiding us to have to disable
the breaking updates in Fedora CoreOS (update overrides). What you talked
earlier about Fedora CoreOS not following updates.


> The packagers
> themselves can't resolve issues that are CoreOS specific, so there
> needs to be something on the other end to help in these cases.


We are not asking that and we are available to help, but again it is quite
rare that we find issues that are CoreOS specific. This work will benefit
all of Fedora :-)

It is also worth noting that Fedora CoreOS is an official edition with over
100k active deployments (
https://discussion.fedoraproject.org/t/fedora-coreos-numbers-02-2025-edition/144475)
, so it would be a nice side effect for packagers to get more familiar with
CoreOS as it is an important part of how fedora packages are consumed.


>
>
> --
> 真実はいつも一つ!/ Always, there's only one truth!
> --
> _______________________________________________
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>
-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to