On Mon, Feb 7, 2022 at 1:27 PM Gilboa Davara <[email protected]> wrote: > > Hello, > > On Mon, Feb 7, 2022 at 8:45 AM Yedidyah Bar David <[email protected]> wrote: >> >> On Sun, Feb 6, 2022 at 5:09 PM Gilboa Davara <[email protected]> wrote: >> > >> > Unlike my predecessor, I not only lost my vmengine, I also lost the vdsm >> > services on all hosts. >> > All seem to be hitting the same issue - read, the certs under >> > /etc/pki/vdsm/certs and /etc/pki/ovirt* all expired a couple of days ago. >> > As such, the hosted engine cannot go into global maintenance mode, >> >> What do you mean by that? What happens if you 'hosted-engine >> --set-maintenance --mode=global'? > > > Failed, stating the cluster is not in global maintenance mode.
Please clarify, and/or share relevant logs, if you have them. You had a semi-working existing HE cluster. You ran engine-backup on it, took a backup, while it was _not_ in global maintenance. That's ok and expected. Then you took one of the hosts and evacuated it (or just a new one), (re)installed the OS (or somehow cleaned it up), and ran 'hosted-engine --deploy --import-from-file' with the backup you took. This failed? Where exactly and with what error? If it's the engine-setup running inside the engine VM, with the same error as when running 'engine-setup' (perhaps with --offline) manually, then this shouldn't happen at this point: - engine-backup --mode=restore sets vdc option in the db 'DbJustRestored' - engine-setup checks this and sets its own env[JUST_RESTORED] accordingly > (Understandable, given two of 3 hosts were offline due to certificate > issues...) > > >> >> >> > preventing engine-setup --offline from running. >> >> Actually just a few days ago I pushed a patch for: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1700460 >> >> But: >> >> If you really have a problem that you can't set global maintenance, >> using this is a risk - HA might intervene in the middle and shutdown >> the VM. So either make sure global maintenance does work, or stop >> all HA services on all hosts. >> >> > Two questions: >> > 1. Is there any automated method to renew the vdsm certificates? >> >> You mean, without an engine? >> >> I think that if you have a functional engine one way or another, >> you can automate this somehow, didn't check. Try checking e.g. the >> python sdk examples - there might be there something you can base >> on. >> >> > 2. Assuming the previous answer is "no", assuming I'm somewhat versed in >> > using openssl, how can I manually renew them? >> >> I'd rather not try to invent from memory how this is supposed to work, >> and doing this methodically and verifying before replying is quite >> an effort. >> >> If this is really what you want, I suggest something like: >> >> 1. Set up a test env with an engine and one host >> 2. Backup (or use git on) /etc on both >> 3. Renew the host cert from the UI >> 4. Check what changed >> >> You should find, IMO, that the key(s) on the host didn't >> change. I guess you might also find CSRs on one or both of them. >> So basically it should be something like: >> 1. Create a CSR on the host for the existing key (one or more, >> not sure). >> 2. Copy and sign this on the engine using pki-enroll-request.sh >> (I think you can find examples for it scattered around, perhaps >> even in the main guides) >> 3. Copy back the generated certs to the host >> 4. Perhaps restart one or more services there (vdsm, imageio?, >> ovn, etc.) >> >> You can check the code in >> /usr/share/ovirt-engine/ansible-runner-service-project/project >> to see how it's done when initiated from the UI. >> >> Good luck and best regards, > > > I more of less found a document stating the above somewhere in the middle of > the night. > Tried it. > Got the WebUI working again. > However, for the life of me I couldn't get the hosts to work to talk to the > engine. (Even though I could use openssl s_client -showcerts -connect host > and got valid certs). > In the end, @around ~4am, I decided to take the brute force route, clean the > hosts, upgrade them to -streams, and redeploy the engine again (3'rd attempt, > after sufficient amount of coffee reminded me the qemu-6.1 is broken, and > needed to be downgraded before trying to deploy the HE...). > Either way, when I finish importing the VMs, I'll open a RFE to add > BIG-WARNING-IN-BOLD-LETTERS in the WebUI to notify the admin that the > certificates are about to expire. You should have already received them, no? https://bugzilla.redhat.com/show_bug.cgi?id=1258585 Best regards, -- Didi _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/VP542G6HCJVPBYK36C2W5UKHSLYGWMST/

