On Mon, Feb 7, 2022 at 1:27 PM Gilboa Davara <[email protected]> wrote:
>
> Hello,
>
> On Mon, Feb 7, 2022 at 8:45 AM Yedidyah Bar David <[email protected]> wrote:
>>
>> On Sun, Feb 6, 2022 at 5:09 PM Gilboa Davara <[email protected]> wrote:
>> >
>> > Unlike my predecessor, I not only lost my vmengine, I also lost the vdsm 
>> > services on all hosts.
>> > All seem to be hitting the same issue - read, the certs under  
>> > /etc/pki/vdsm/certs and /etc/pki/ovirt* all expired a couple of days ago.
>> > As such, the hosted engine cannot go into global maintenance mode,
>>
>> What do you mean by that? What happens if you 'hosted-engine
>> --set-maintenance --mode=global'?
>
>
> Failed, stating the cluster is not in global maintenance mode.

Please clarify, and/or share relevant logs, if you have them.

You had a semi-working existing HE cluster.
You ran engine-backup on it, took a backup, while it was _not_ in
global maintenance.
That's ok and expected.

Then you took one of the hosts and evacuated it (or just a new one),
(re)installed the OS (or somehow cleaned it up), and ran
'hosted-engine --deploy --import-from-file' with the backup you took.
This failed? Where exactly and with what error?

If it's the engine-setup running inside the engine VM, with the same
error as when running 'engine-setup' (perhaps with --offline) manually,
then this shouldn't happen at this point:
- engine-backup --mode=restore sets vdc option in the db 'DbJustRestored'
- engine-setup checks this and sets its own env[JUST_RESTORED] accordingly

> (Understandable, given two of 3 hosts were offline due to certificate 
> issues...)
>
>
>>
>>
>> > preventing engine-setup --offline from running.
>>
>> Actually just a few days ago I pushed a patch for:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1700460
>>
>> But:
>>
>> If you really have a problem that you can't set global maintenance,
>> using this is a risk - HA might intervene in the middle and shutdown
>> the VM. So either make sure global maintenance does work, or stop
>> all HA services on all hosts.
>>
>> > Two questions:
>> > 1. Is there any automated method to renew the vdsm certificates?
>>
>> You mean, without an engine?
>>
>> I think that if you have a functional engine one way or another,
>> you can automate this somehow, didn't check. Try checking e.g. the
>> python sdk examples - there might be there something you can base
>> on.
>>
>> > 2. Assuming the previous answer is "no", assuming I'm somewhat versed in 
>> > using openssl, how can I manually renew them?
>>
>> I'd rather not try to invent from memory how this is supposed to work,
>> and doing this methodically and verifying before replying is quite
>> an effort.
>>
>> If this is really what you want, I suggest something like:
>>
>> 1. Set up a test env with an engine and one host
>> 2. Backup (or use git on) /etc on both
>> 3. Renew the host cert from the UI
>> 4. Check what changed
>>
>> You should find, IMO, that the key(s) on the host didn't
>> change. I guess you might also find CSRs on one or both of them.
>> So basically it should be something like:
>> 1. Create a CSR on the host for the existing key (one or more,
>> not sure).
>> 2. Copy and sign this on the engine using pki-enroll-request.sh
>> (I think you can find examples for it scattered around, perhaps
>> even in the main guides)
>> 3. Copy back the generated certs to the host
>> 4. Perhaps restart one or more services there (vdsm, imageio?,
>> ovn, etc.)
>>
>> You can check the code in
>> /usr/share/ovirt-engine/ansible-runner-service-project/project
>> to see how it's done when initiated from the UI.
>>
>> Good luck and best regards,
>
>
> I more of less found a document stating the above somewhere in the middle of 
> the night.
> Tried it.
> Got the WebUI working again.
> However, for the life of me I couldn't get the hosts to work to talk to the 
> engine. (Even though I could use openssl s_client -showcerts -connect host 
> and got valid certs).
> In the end, @around ~4am, I decided to take the brute force route, clean the 
> hosts, upgrade them to -streams, and redeploy the engine again (3'rd attempt, 
> after sufficient amount of coffee reminded me the qemu-6.1 is broken, and 
> needed to be downgraded before trying to deploy the HE...).
> Either way, when I finish importing the VMs, I'll open a RFE to add 
> BIG-WARNING-IN-BOLD-LETTERS in the WebUI to notify the admin that the 
> certificates are about to expire.

You should have already received them, no?

https://bugzilla.redhat.com/show_bug.cgi?id=1258585

Best regards,
-- 
Didi
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/VP542G6HCJVPBYK36C2W5UKHSLYGWMST/

Reply via email to