Hi folks,
As we continue our investigation, here is an update: - A number of job config.xml and “folder configuration” xml files appear to have vanished in a yet-to-be-determined time window. We believe this would have happened between the last Jenkins restart and 17 Feb. We are still investigating the exact time window, but our priority has been on restoring backups. - Due to the fact that Jenkins caches the config.xml files while running, the problem was not noticed until the service unexpectedly crashed/restarted on 17 Feb. - Unfortunately, due to the extremely large jobs/ directory on ci-builds (1.5TB), we are not able to retain many historical backups. At this point, we believe we will be able to recover the vast majority of job configs, although some may revert to a configuration from early January. We continue to work on data restoration, and are taking great care to not overwrite any new configurations with old backup data. Infra is working to evaluate and implement a more robust backup strategy for job configurations based on the lessons learned in this event. Due to the extremely large size of the backup sets, this process will be ongoing for at least another 24 hours. We’ll provide another update as soon as possible within that timeframe. -Chris ASF Infra > On Feb 18, 2021, at 11:09 AM, Chris Lambertus <c...@apache.org> wrote: > > On Wednesday 18 Feb approximately 1800 hours UTC, Infra was notified that the > ci-builds Jenkins service was offline. We contacted our service provider, and > they restarted the system. After this restart, Infra started receiving > notifications of missing job configurations. > > Upon beginning the investigation, the system again went offline, and required > another forced restart. > > Infra's initial investigation shows a number of missing config.xml files > which we are in the process of restoring from backup. > > Additionally, Infra is evaluating the existing hardware for issues, and > beginning a migration process to new hardware. > > Please bear with us as we restore the config.xml files, and investigate the > root cause of their disappearance. As part of this recovery, we are exploring > additional avenues to improve the backup methodologies for job configurations. > > We will provide another update once we are further along in the restoration > process, likely within 24 hours. > > -Chris > ASF Infra > > > >> On Feb 18, 2021, at 1:20 AM, Gavin McDonald <gmcdon...@apache.org> wrote: >> >> Hi All, >> >> No need for any more messages, all of it is known. >> >> On Thu, Feb 18, 2021 at 10:15 AM P. Ottlinger <pottlin...@apache.org> wrote: >> >>> Hi, >>> >>> Am 18.02.21 um 06:05 schrieb Gavin McDonald: >>>> Whilst the service is back, there are quite a few jobs 'missing' since >>> the >>>> machine went offline. >>> >>> Creadur is missing completely: >>> https://ci-builds.apache.org/job/Creadur/ >>> >>> :( >>> >>> Thanks, >>> Phil >>> >> >> >> -- >> >> *Gavin McDonald* >> Systems Administrator >> ASF Infrastructure Team >