For the records,

After having fixed the major fs issues with guestfish and since the DB was
not starting up, I removed everything from DB data dir and recreated it as
below:

rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/*
/opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb
systemctl restart rh-postgresql10-postgresql.service

Then proceeded with the restoration, where I requested to provision all
missing databases:
engine-backup --mode=restore --file=engine-backup.gz
--provision-all-databases \
--log=restore.log --restore-permissions

Following this, ran engine-setup, as instructed from the restore operation.
Gained engine web access and saw the same running VMs were shown as up
without issues.
I only observed one VM not able to start due to illegal volume, but that's
another story.


On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkickt...@gmail.com> wrote:

>
>
> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote:
>
>> Hi Didi,
>>
>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com>
>> wrote:
>>
>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a corrupt self-hosted engine (with several file system errors,
>>>> postgres not able to start) and thus it does not give access to the web UI.
>>>> This happened following an unlucky split brain resolution (I am running 2
>>>> nodes). The two hosts are running VMs also which I would like to keep
>>>> running as they are needed.
>>>>
>>>> When trying to boot into rescue mode (using
>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing
>>>> else.
>>>>
>>>
>>> This means that more than just the DB is corrupt...
>>>
>>>
>>>>
>>>> I have backups of engine files with scope all (using the engine-backup
>>>> tool).
>>>> What is the best approach to try and fix the engine or redeploy.
>>>>
>>>
>>> If you are careful, and know what you are doing, you can try something
>>> like the following. I am not giving many details, hopefully you can find on
>>> the net tutorials about how to use the things I suggest:
>>>
>>> 1. Move to global maintenance
>>>
>>> 2. Stop the current dead vm (if needed)
>>>
>>> 3. Find current vm conf, edit it to boot from a rescue iso image of your
>>> preference or from net/PXE etc., and start the vm with '--vm-conf' pointing
>>> to your edited file.
>>>
>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or
>>> use '--add-console-password' and remote viewer, if needed)
>>>
>>> 5. Clean the disk and install the OS, oVirt, etc.
>>>
>>> 6. Copy your backup into the vm and restore with engine-backup
>>>
>>> 7. Then cleanly stop the machine, exit global maint, and let HA start it
>>> (or start it yourself with --vm-start).
>>>
>>> At the time, we had a bug [1] to document this. The result is [2]. It
>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is
>>> dead but fs is ok).
>>> For something somewhat similar to what you want, see also [3], which
>>> uses guestfish. Might be useful, depending on how badly your disk is
>>> corrupted.
>>>
>> I went with the guestfish approach. It has fixed some fs issues and now
>> the yum etc seem fine apart from postgres.
>> I had tried previously to uninstall/install packages so I ended
>> installing them again with yum install ovirt\*setup\*.
>> Now I think I have to run engine-setup but I get the error:
>>
>>  Failed to execute stage 'Environment setup': Cannot connect to Engine
>> database using existing credentials: engine@localhost:5432
>>
> Seems that I need to have psql running to be able to run engine-backup
> --mode=restore. Are there any steps how one could manually prepare pgsql
> for ovirt so as to attempt restoration?
>
>>
>> So I guess I need to follow [2]. What do you think?
>>
>>
>>> How did you run into a split brain? There is a lock on the shared
>>> storage that should prevent this.
>>>
>>> Good luck and best regards,
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>>> [2]
>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine
>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>>> --
>>> Didi
>>>
>>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SU6V565Y5GAZ67FF5MUDGFLEJ2L2LZV7/

Reply via email to