[Cloud] [Cloud-announce] [Incident] Toolforge: Ongoing intermittent DNS resolution issues

2024-11-26 Thread Slavina Stefanova
Hello, We are currently investigating widespread intermittent DNS resolution issues within the Toolforge Kubernetes cluster that began on Sunday. These issues are causing some jobs and deployments to fail, particularly on NFS worker nodes. Impact: - Some tools may experience failed deployment

[Cloud] [Cloud-announce] Re: [Incident] Toolforge: Ongoing intermittent DNS resolution issues

2024-11-26 Thread Slavina Stefanova
Hello everyone, Following up on this incident - the situation has stabilized following control plane node reboots at around 10:25 UTC. *Current status:* - No new DNS-related failures have been observed since the control plane reboots - Tool deployments and jobs are running normally Whi