For the records, After having fixed the major fs issues with guestfish and since the DB was not starting up, I removed everything from DB data dir and recreated it as below:
rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/* /opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb systemctl restart rh-postgresql10-postgresql.service Then proceeded with the restoration, where I requested to provision all missing databases: engine-backup --mode=restore --file=engine-backup.gz --provision-all-databases \ --log=restore.log --restore-permissions Following this, ran engine-setup, as instructed from the restore operation. Gained engine web access and saw the same running VMs were shown as up without issues. I only observed one VM not able to start due to illegal volume, but that's another story. On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkickt...@gmail.com> wrote: > > > On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote: > >> Hi Didi, >> >> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com> >> wrote: >> >>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I have a corrupt self-hosted engine (with several file system errors, >>>> postgres not able to start) and thus it does not give access to the web UI. >>>> This happened following an unlucky split brain resolution (I am running 2 >>>> nodes). The two hosts are running VMs also which I would like to keep >>>> running as they are needed. >>>> >>>> When trying to boot into rescue mode (using >>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing >>>> else. >>>> >>> >>> This means that more than just the DB is corrupt... >>> >>> >>>> >>>> I have backups of engine files with scope all (using the engine-backup >>>> tool). >>>> What is the best approach to try and fix the engine or redeploy. >>>> >>> >>> If you are careful, and know what you are doing, you can try something >>> like the following. I am not giving many details, hopefully you can find on >>> the net tutorials about how to use the things I suggest: >>> >>> 1. Move to global maintenance >>> >>> 2. Stop the current dead vm (if needed) >>> >>> 3. Find current vm conf, edit it to boot from a rescue iso image of your >>> preference or from net/PXE etc., and start the vm with '--vm-conf' pointing >>> to your edited file. >>> >>> 4. Connect a console (hosted-engine --console, or 'virsh console', or >>> use '--add-console-password' and remote viewer, if needed) >>> >>> 5. Clean the disk and install the OS, oVirt, etc. >>> >>> 6. Copy your backup into the vm and restore with engine-backup >>> >>> 7. Then cleanly stop the machine, exit global maint, and let HA start it >>> (or start it yourself with --vm-start). >>> >>> At the time, we had a bug [1] to document this. The result is [2]. It >>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is >>> dead but fs is ok). >>> For something somewhat similar to what you want, see also [3], which >>> uses guestfish. Might be useful, depending on how badly your disk is >>> corrupted. >>> >> I went with the guestfish approach. It has fixed some fs issues and now >> the yum etc seem fine apart from postgres. >> I had tried previously to uninstall/install packages so I ended >> installing them again with yum install ovirt\*setup\*. >> Now I think I have to run engine-setup but I get the error: >> >> Failed to execute stage 'Environment setup': Cannot connect to Engine >> database using existing credentials: engine@localhost:5432 >> > Seems that I need to have psql running to be able to run engine-backup > --mode=restore. Are there any steps how one could manually prepare pgsql > for ovirt so as to attempt restoration? > >> >> So I guess I need to follow [2]. What do you think? >> >> >>> How did you run into a split brain? There is a lock on the shared >>> storage that should prevent this. >>> >>> Good luck and best regards, >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710 >>> [2] >>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine >>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4 >>> -- >>> Didi >>> >>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SU6V565Y5GAZ67FF5MUDGFLEJ2L2LZV7/