On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:

> Hi all,
>
> I have a corrupt self-hosted engine (with several file system errors,
> postgres not able to start) and thus it does not give access to the web UI.
> This happened following an unlucky split brain resolution (I am running 2
> nodes). The two hosts are running VMs also which I would like to keep
> running as they are needed.
>
> When trying to boot into rescue mode (using systemd.unit=emergency.target
> boot parameter) I get a cursor and nothing else.
>

This means that more than just the DB is corrupt...


>
> I have backups of engine files with scope all (using the engine-backup
> tool).
> What is the best approach to try and fix the engine or redeploy.
>

If you are careful, and know what you are doing, you can try something like
the following. I am not giving many details, hopefully you can find on the
net tutorials about how to use the things I suggest:

1. Move to global maintenance

2. Stop the current dead vm (if needed)

3. Find current vm conf, edit it to boot from a rescue iso image of your
preference or from net/PXE etc., and start the vm with '--vm-conf' pointing
to your edited file.

4. Connect a console (hosted-engine --console, or 'virsh console', or use
'--add-console-password' and remote viewer, if needed)

5. Clean the disk and install the OS, oVirt, etc.

6. Copy your backup into the vm and restore with engine-backup

7. Then cleanly stop the machine, exit global maint, and let HA start it
(or start it yourself with --vm-start).

At the time, we had a bug [1] to document this. The result is [2]. It does
not detail how to boot/reinstall os/etc., only restore (if e.g. db is dead
but fs is ok).
For something somewhat similar to what you want, see also [3], which uses
guestfish. Might be useful, depending on how badly your disk is corrupted.

How did you run into a split brain? There is a lock on the shared storage
that should prevent this.

Good luck and best regards,

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
[2]
https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
-- 
Didi
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2ALJN3CXYNC2UUCEI6H7HX3QU7YWUAML/

Reply via email to