[ovirt-users] Re: Fix corrupt self-hosted engine

Yedidyah Bar David Thu, 19 Nov 2020 07:17:47 -0800

On Thu, Nov 19, 2020 at 5:12 PM Yedidyah Bar David <d...@redhat.com> wrote:


> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a corrupt self-hosted engine (with several file system errors,
>> postgres not able to start) and thus it does not give access to the web UI.
>> This happened following an unlucky split brain resolution (I am running 2
>> nodes). The two hosts are running VMs also which I would like to keep
>> running as they are needed.
>>
>> When trying to boot into rescue mode (using systemd.unit=emergency.target
>> boot parameter) I get a cursor and nothing else.
>>
>
> This means that more than just the DB is corrupt...
>
>
>>
>> I have backups of engine files with scope all (using the engine-backup
>> tool).
>> What is the best approach to try and fix the engine or redeploy.
>>
>
> If you are careful, and know what you are doing, you can try something
> like the following. I am not giving many details, hopefully you can find on
> the net tutorials about how to use the things I suggest:
>
> 1. Move to global maintenance
>
> 2. Stop the current dead vm (if needed)
>
> 3. Find current vm conf, edit it to boot from a rescue iso image of your
> preference or from net/PXE etc., and start the vm with '--vm-conf' pointing
> to your edited file.
>
> 4. Connect a console (hosted-engine --console, or 'virsh console', or use
> '--add-console-password' and remote viewer, if needed)
>
> 5. Clean the disk and install the OS, oVirt, etc.
>
> 6. Copy your backup into the vm and restore with engine-backup
>
> 7. Then cleanly stop the machine, exit global maint, and let HA start it
> (or start it yourself with --vm-start).
>
> At the time, we had a bug [1] to document this. The result is [2]. It does
> not detail how to boot/reinstall os/etc., only restore (if e.g. db is dead
> but fs is ok).
> For something somewhat similar to what you want, see also [3], which uses
> guestfish. Might be useful, depending on how badly your disk is corrupted.
>
> How did you run into a split brain? There is a lock on the shared storage
> that should prevent this.
>

Also, to clarify:

The "official" answer is to deploy a new hosted-engine, on new storage,
with --restore-from-file. This IMO does not let you keep your VMs up, at
least not all of them, definitely if you don't have another host to restore
on.

Keeping the VMs up is risky if you have HA VMs, or if you
started/stopped/migrated VMs after you took your backup.

Best regards,
-- 
Didi

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCCYFBSZVUY7YJ7L5Q5U3SG4CI3CPHNN/

[ovirt-users] Re: Fix corrupt self-hosted engine

Reply via email to