Hello!
I have done the shutdown and reboot as requested -- please let me know
if you see any bad effects.
Regarding the Horizon login... this may be a case of
https://phabricator.wikimedia.org/T383370, in which case clearing
cookies or trying a different browser would work. However, I've
discovered a different Horizon login issue while traveling, which is
that the network where I'm staying today blocks high-numbered port
access; a complete login to Horizon requires access to
https://openstack.codfw1dev.wikimediacloud.org:25000 (as part of
authentication). So if you have a way to unblock port 25000 that may
also solve the problem.
Please let me know if you sort out the Horizon login issue, I'm
concerned that the high-port issue might be a widespread issue that I
just haven't heard much about.
Thank you!
-Andrew
On 2/2/25 5:33 AM, Dirk Hünniger via Cloud wrote:
Hi Andrew,
I just added a @reboot line to the crontab of the mediawiki2latex
instance. And everything came up well after I rebooted from the
command line. Unfortunately I currently cannot log into horizon at the
moment, so I cannot issue a hard reset as you requested. But it is
perfectly Ok for me if you do it any time you like. I just think it is
a good idea to shutdown the machine normally before you hard-reset it
in order to avoid any data corruption.
Thanks a lot for your help.
Yours Dirk
On 1/31/25 16:27, Andrew Bogott wrote:
The issue that resulted in partial VM reboots last week[0] (see "VM
reboots coming Tuesday, 2024-01-20") turns out to be more widespread,
affecting virtually all instances. The primary symptom is that it
restricts our ability to drain and maintain WMCS hardware.
In order to resolve the issue, all that's needed is a hard reboot of
each VM. Note that a simple in-place reboot (for instance, issued at
in the shell of the VM) does NOT resolve the issue. Hard reboots must
be performed via Horizon, by selecting 'Hard Reboot Instance' on the
Instances panel.
So, please, at your convenience, log into Horizon and hard reboot any
of your VMs that appear on the list below. For informal coordination
I have also populated an etherpad[1] with the list of affected VMs.
Anything that remains in need of a reboot by late next week
(Thursday, February 6th) I will reboot for you. So, if you don't care
when/if your VM is rebooted you can ignore this message :)
Thank you! And, sorry for the inconvenience. We have a plan in
place[2] which should prevent this issue from re-appearing in the
future.
-Andrew
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/