Hi folks,
As we continue our investigation, here is an update:

- A number of job config.xml and “folder configuration” xml files appear to 
have vanished in a yet-to-be-determined time window. We believe this would have 
happened between the last Jenkins restart and 17 Feb. We are still 
investigating the exact time window, but our priority has been on restoring 
backups.

- Due to the fact that Jenkins caches the config.xml files while running, the 
problem was not noticed until the service unexpectedly crashed/restarted on 17 
Feb.

- Unfortunately, due to the extremely large jobs/ directory on ci-builds 
(1.5TB), we are not able to retain many historical backups. At this point, we 
believe we will be able to recover the vast majority of job configs, although 
some may revert to a configuration from early January. We continue to work on 
data restoration, and are taking great care to not overwrite any new 
configurations with old backup data.

Infra is working to evaluate and implement a more robust backup strategy for 
job configurations based on the lessons learned in this event.

Due to the extremely large size of the backup sets, this process will be 
ongoing for at least another 24 hours. We’ll provide another update as soon as 
possible within that timeframe.

-Chris
ASF Infra




> On Feb 18, 2021, at 11:09 AM, Chris Lambertus <c...@apache.org> wrote:
> 
> On Wednesday 18 Feb approximately 1800 hours UTC, Infra was notified that the 
> ci-builds Jenkins service was offline. We contacted our service provider, and 
> they restarted the system. After this restart, Infra started receiving 
> notifications of missing job configurations.
> 
> Upon beginning the investigation, the system again went offline, and required 
> another forced restart.
> 
> Infra's initial investigation shows a number of missing config.xml files 
> which we are in the process of restoring from backup.
> 
> Additionally, Infra is evaluating the existing hardware for issues, and 
> beginning a migration process to new hardware.
> 
> Please bear with us as we restore the config.xml files, and investigate the 
> root cause of their disappearance. As part of this recovery, we are exploring 
> additional avenues to improve the backup methodologies for job configurations.
> 
> We will provide another update once we are further along in the restoration 
> process, likely within 24 hours.
> 
> -Chris
> ASF Infra
> 
> 
> 
>> On Feb 18, 2021, at 1:20 AM, Gavin McDonald <gmcdon...@apache.org> wrote:
>> 
>> Hi All,
>> 
>> No need for any more messages, all of it is known.
>> 
>> On Thu, Feb 18, 2021 at 10:15 AM P. Ottlinger <pottlin...@apache.org> wrote:
>> 
>>> Hi,
>>> 
>>> Am 18.02.21 um 06:05 schrieb Gavin McDonald:
>>>> Whilst the service is back, there are quite a few jobs 'missing' since
>>> the
>>>> machine went offline.
>>> 
>>> Creadur is missing completely:
>>> https://ci-builds.apache.org/job/Creadur/
>>> 
>>> :(
>>> 
>>> Thanks,
>>> Phil
>>> 
>> 
>> 
>> -- 
>> 
>> *Gavin McDonald*
>> Systems Administrator
>> ASF Infrastructure Team
> 

Reply via email to