Hi folks My team has been working on an issue on-and-off since July 23rd.
I think we might have hit the jackpot in terms of trying to reproduce the issue that affected us initially on July 23rd. Here’s what happened: - Once the copy of the Prod Jenkins Home finished, I started Jenkins into quiet mode <https://support.cloudbees.com/hc/en-us/articles/203737684-How-can-I-prevent-jenkins-from-starting-new-jobs-after-a-restart-?page=92> (I didn’t want a prod deployment that runs on a schedule running in stage by mistake). Jenkins started without issues. - Then, I disabled all the jobs (again to prevent a job from running by mistake whenever I took Jenkins out of quiet mode). - Then, since we were running stage with production’s config, the stage controller actually connected to the prod AWS account to create the agents there. Ooops. - Since having stage create its agents in the wrong AWS account is not ideal, I ran my ansible configuration playbook in stage. Three restarts later and Jenkins didn’t crash in any of them. Stage configuration was successful! - From the UI, I disabled quiet mode, but I noticed the builds were not starting. 2021-09-07 20:19:11.628+0000 [id=29] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7a94f7bb failed java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded at hudson.ExtensionList.getInstance(ExtensionList.java:166) at jenkins.security.QueueItemAuthenticatorConfiguration.get(QueueItemAuthenticatorConfiguration.java:61) at jenkins.security.QueueItemAuthenticatorConfiguration$ProviderImpl.getAuthenticators(QueueItemAuthenticatorConfiguration.java:70) at jenkins.security.QueueItemAuthenticatorProvider$IteratorImpl.hasNext(QueueItemAuthenticatorProvider.java:44) at hudson.model.Queue$Item.authenticate(Queue.java:2331) at hudson.model.Node.canTake(Node.java:401) at hudson.model.Queue.makeFlyWeightTaskBuildable(Queue.java:1736) at hudson.model.Queue.makeBuildable(Queue.java:1698) at hudson.model.Queue.maintain(Queue.java:1546) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2902) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) - So I restarted Jenkins one more time (again, with the same configuration my playbook had left in the previous restart, no changes), when suddenly java.lang.IllegalStateException: Expected 1 instance of jenkins.security.s2m.AdminWhitelistRule but got 0 at hudson.ExtensionList.lookupSingleton(ExtensionList.java:451) at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:59) at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:42) at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:286) at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:351) at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:287) at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:777) at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:713) at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:777) at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:762) at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:638) at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:307) at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:299) This is an issue that has shown up before. Usually another restart fixes the issue, but I’ve now restarted Jenkins about 4 times and it still shows up that error. I’m hoping this will allow us to investigate a bit more what’s going on. I have the GC logs, logs, thread dumps and an SOS report from stage. The latest PID is 2058587, so the last GC logs is this file gc-2058587-2021-09-07_16-11-45.log. Some of those would need to be sanitized before I can share, but let me know if any of that would be useful. First and foremost, is there a fix for this? Secondly, is this a known bug? Best Regards, Doug Whitfield -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/a081f839-cd67-48ce-b4d4-a8ae73777019n%40googlegroups.com.