On 2/14/25 5:07 AM, Pierre-Yves Chibon wrote: > On Fri, Feb 14, 2025 at 10:34:51AM +0100, Clement Verna wrote: >> On Thu, 13 Feb 2025 at 17:44, Kevin Fenzi <[1]ke...@scrye.com> wrote: >> >> I agree with downthread folks that that seems like way too high a >> failure rate to enable gating on. However, a few questions if I can: >> >> Yes the failure rate is quite high and most of these are real failures, >> that we deal with in Fedora CoreOS. So I am reading this like, because the >> tests are catching too many failures we should continue ignoring them 🫤 > > I think what is scaring people with the data you've provided is that we do not > know which %/numbers of these failures are genuine failures that should gate > the > update because they are bugs vs infrastructure/pipeline issues. > Would you have a way to distinguish between the two? Basically a failure vs > error output.
I think what you bring up here is valid and I think in our next round of metrics we will come up with a way to classify the failures so we can get a better idea. However, I'd like to propose that we don't let this discourage us from moving forward. You've raised concerns and we hear you. What I will say, though, is that we do monitor these failures (hence the matrix channel) and we do restart tests if we believe they are failing due to flakes or issues on our side. In other words, if the failure is believed to be on our side we try to resolve the issue without package maintainers needing to do anything. Now, will we always be looking at them in realtime? No. However, I would propose that we gate by default and try to give some time to determine the root cause before waiving. > > The push-back I'm hearing is more toward: there are a lot of failures here and > if they are all related to infrastructure issues then we're going to cause > disruption without a clear benefits. I'd like to push back slightly on the word "disruption" here. IMO disruption is more applicable in the case where a test fails (keep in mind we are already running the tests and reporting the results) and it goes in anyway and causes issues in downstream built artifacts. We (Fedora as a whole) were given bad results and it went in anyway. > Now if you're able to say: "95% of these errors are genuine bug that today are > impacting our users despite our pipeline having found it and 5% are > infrastructure related", that's a different story :) IMO the bar would only need to be that high if the user had no way to ignore the test results. All gating does here (IIUC) is require them to do an extra step before it automatically flows into the next rawhide compose. Dusty -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue