I'm going to provide a simplified description about a problem which caused 
all the jobs on a Jenkins instance to fail. Let me know where I can include 
more detail to help uncover the root problem.

We have a Jenkins instance with the Pipeline plugin installed. There are a 
dozen jobs all scheduled to run at different times overnight. Every job has 
a single pipeline script which constructs an object and calls a single 
method on that object. Except for the parameters, the scripts are identical 
across all jobs. The class was written by us and performs pretty complex 
operations under the hood. The jobs were created and configured by 
administrators. Non administrators have never touched any job 
configuration. The jobs have run successfully for many weeks. Up to today, 
we have not had to explicitly manage the Script Security plugin. Presumably 
this is because script security had been behaving properly as described 
under "Script Approval" on the Script Security plugin information page 
<https://wiki.jenkins-ci.org/display/JENKINS/Script+Security+Plugin>. We do 
not use Groovy Sandboxing.

Last night, all of the jobs failed with the following console output:

Started by timer
org.jenkinsci.plugins.scriptsecurity.scripts.UnapprovedUsageException: script 
not yet approved for use
        at 
org.jenkinsci.plugins.scriptsecurity.scripts.ScriptApproval.using(ScriptApproval.java:459)
        at 
org.jenkinsci.plugins.workflow.cps.CpsFlowDefinition.create(CpsFlowDefinition.java:105)
        at 
org.jenkinsci.plugins.workflow.cps.CpsFlowDefinition.create(CpsFlowDefinition.java:58)
        at 
org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206)
        at hudson.model.ResourceController.execute(ResourceController.java:98)
        at hudson.model.Executor.run(Executor.java:410)
Finished: FAILURE


No operations were performed yesterday which we suspect could have caused 
this error (no job configuration changes, no plugin upgrades, no hitting 
the clear script authorization button). When the jobs were run manually by 
an admin, they continued to fail with the same error. We were able to get a 
job to successfully complete again by saving its configuration without 
making changes. Saving the configuration of a single job did not fix the 
error for the rest of the jobs. We were able to get jobs running again by 
re-saving the configuration for each job individually. Up to this point, 
nothing appeared in the Script Authorization queue. Once half of the jobs 
were fixed, we restarted the Jenkins service. Half of the jobs still failed 
when run, but now the Script Authorization queue was populated with the 
Pipeline scripts for those broken jobs. Approving the scripts from the 
queue fixed the remaining jobs.

My best guess about what happened is (unexpected behaviors in *bold*):

   1. The script authorization white list was correctly populated as jobs 
   were added and configured with Pipeline scripts by Jenkins admins
   2. The jobs ran successfully for some time
   3. *An Unknown Event caused the script authorization white list to be 
   cleared*
   4. The jobs started failing with the UnapprovedUsageException error
   5. *Some failure caused the script authorization queue to not be 
   populated with the failed scripts*
   6. Having an admin re-save the configuration for a job successfully 
   re-authorized the Pipeline script for that job
   7. Restarting the Jenkins service fixed the error with the script 
   authorization queue
   8. Approving from the script authorization queue worked as expected

Assuming my guess is more or less correct, I am most interested in 
diagnosing the Unknown Event so that we can take steps to prevent it from 
happening in the future. If that is not possible, steps to prevent the 
Script Security plugin from ever blocking jobs would also be appreciated. 
In our case, job stability is much more valuable than the security benefits 
provided by the plugin. I am also interested in the strange behavior of the 
script authorization queue, but that is not critical.

I think it is less plausible, but I'll also suggest another guess which 
would explain this behavior (unexpected behaviors in *bold*):

   1. *The Script Security plugin was never been running correctly, so its 
   white list was not populated*
   2. *The jobs ran successfully because the Script Security plugin was 
   broken and did not stop them*
   3. *An Unknown Event caused the Script Security plugin to suddenly start 
   working*
   4. The jobs started failing because their Pipeline scripts were not yet 
   on the white list
   5. *Some failure caused the script authorization queue to not be 
   populated with the failed scripts*
   6. Having an admin re-save the configuration for a job successfully 
   authorized the Pipeline script for that job for the first time
   7. Restarting the Jenkins service fixed the error with the script 
   authorization queue
   8. Approving from the script authorization queue worked as expected


I would appreciate any help in figuring out what happened and how to 
prevent recurrences. Please let me know if there is anything I provide/do 
to make it easier. Thank you,

Daniel Koverman


-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/97325caa-d60b-4f7a-9498-d24622a1b48e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to