[Cloud] Re: [Cloud-announce] Project admins, please hard reboot your VMs (or I will)

Andrew Bogott Sun, 02 Feb 2025 07:41:52 -0800

Hello!

I have done the shutdown and reboot as requested -- please let me knowif you see any bad effects.

Regarding the Horizon login... this may be a case ofhttps://phabricator.wikimedia.org/T383370, in which case clearingcookies or trying a different browser would work. However, I'vediscovered a different Horizon login issue while traveling, which isthat the network where I'm staying today blocks high-numbered portaccess; a complete login to Horizon requires access tohttps://openstack.codfw1dev.wikimediacloud.org:25000 (as part ofauthentication). So if you have a way to unblock port 25000 that mayalso solve the problem.

Please let me know if you sort out the Horizon login issue, I'mconcerned that the high-port issue might be a widespread issue that Ijust haven't heard much about.


Thank you!

-Andrew


On 2/2/25 5:33 AM, Dirk Hünniger via Cloud wrote:

Hi Andrew,
I just added a @reboot line to the crontab of the mediawiki2latexinstance. And everything came up well after I rebooted from thecommand line. Unfortunately I currently cannot log into horizon at themoment, so I cannot issue a hard reset as you requested. But it isperfectly Ok for me if you do it any time you like. I just think it isa good idea to shutdown the machine normally before you hard-reset itin order to avoid any data corruption.
Thanks a lot for your help.

Yours Dirk

On 1/31/25 16:27, Andrew Bogott wrote:
The issue that resulted in partial VM reboots last week[0] (see "VMreboots coming Tuesday, 2024-01-20") turns out to be more widespread,affecting virtually all instances. The primary symptom is that itrestricts our ability to drain and maintain WMCS hardware.
In order to resolve the issue, all that's needed is a hard reboot ofeach VM. Note that a simple in-place reboot (for instance, issued atin the shell of the VM) does NOT resolve the issue. Hard reboots mustbe performed via Horizon, by selecting 'Hard Reboot Instance' on theInstances panel.
So, please, at your convenience, log into Horizon and hard reboot anyof your VMs that appear on the list below. For informal coordinationI have also populated an etherpad[1] with the list of affected VMs.
Anything that remains in need of a reboot by late next week(Thursday, February 6th) I will reboot for you. So, if you don't carewhen/if your VM is rebooted you can ignore this message :)
Thank you! And, sorry for the inconvenience. We have a plan inplace[2] which should prevent this issue from re-appearing in thefuture.
-Andrew
_______________________________________________
Cloud mailing list -- [email protected]
List information:https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/



_______________________________________________
Cloud mailing list -- [email protected]
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

[Cloud] Re: [Cloud-announce] Project admins, please hard reboot your VMs (or I will)

Reply via email to