[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

Simone Tiraboschi Thu, 07 Mar 2019 07:56:20 -0800

On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov <[email protected]>
wrote:


>
>
>
> >The OVF_STORE volume is going to get periodically recreated by the engine
> so at least you need a running engine.
>
> >In order to avoid this kind of issue we have two OVF_STORE disks, in your
> case:
>
> >MainThread::INFO::2019-03-06
> 06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429,
> volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81
> >MainThread::INFO::2019-03-06
> 06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
> Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0,
> volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4
>
> >Can you please check if you have at lest the second copy?
>
> Second Copy is empty too:
> [root@ovirt1 ~]# ll
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
> total 66561
> -rw-rw----. 1 vdsm kvm       0 Mar  4 05:23
> c3309fc0-8707-4de1-903d-8d4bbb024f81
> -rw-rw----. 1 vdsm kvm 1048576 Jan 31 13:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.lease
> -rw-r--r--. 1 vdsm kvm     435 Mar  4 05:24
> c3309fc0-8707-4de1-903d-8d4bbb024f81.meta
>
>
>
> >And even in the case you lost both, we are storing on the shared storage
> the initial vm.conf:
> >MainThread::ERROR::2019-03-06
> >06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store)
> Failed extracting VM OVF from the OVF_STORE volume, falling back to initial
> vm.conf
>
> >Can you please check what do you have
> in /var/run/ovirt-hosted-engine-ha/vm.conf ?
>
> It exists and has the following:
>
> [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf
> # Editing the hosted engine VM is only possible via the manager UI\API
> # This file was generated at Thu Mar  7 15:37:26 2019
>
> vmId=8474ae07-f172-4a20-b516-375c73903df7
> memSize=4096
> display=vnc
> devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
> type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk}
> devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00,
> slot:0x06, domain:0x0000, type:pci,
> function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
> devices={device:scsi,model:virtio-scsi,type:controller}
> devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00,
> slot:0x03, domain:0x0000, type:pci,
> function:0x0},device:bridge,type:interface}
> devices={device:console,type:console}
> devices={device:vga,alias:video0,type:video}
> devices={device:vnc,type:graphics}
> vmName=HostedEngine
>
> spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
> smp=1
> maxVCpus=8
> cpuType=Opteron_G5
> emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(',
> ')|first
> devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}
>

You should be able to copy it to /root/myvm.conf.xml and start the engine
VM with
hosted-engine --vm-start --vm-conf=/root/myvm.conf


>
>
>
> Also, I think this happened when I was upgrading ovirt1 (last in the
> gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I
> forgot to enable the global maintenance.
>
>
> >Sorry, I don't understand
> >Can you please explain what happened?
>
> I have updated the engine first -> All OK, next was the arbiter -> again
> no issues with it.
> Next was the empty host -> ovirt2 and everything went OK.
> After that I migrated the engine to ovirt2 , and tried to updated ovirt1.
> The web showed that the installation failed, but using "yum update" was
> working.
> During the update via yum of ovirt1 -> the engine app crashed and
> restarted on ovirt2.
> After the reboot of ovirt1 I have noticed the error about pinging the
> gateway ,thus I stopped the engine and stopped the following services on
> both hosts (global maintenance):
> ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock
>
> Next was a reinitialization of the sanlock space via 'sanlock direct -s'.
> In the end I have managed to power on the hosted-engine and it was running
> for a while.
>
> As the errors did not stop - I have decided to shutdown everything, then
> power it up , heal gluster and check what will happen.
>
> Currently I'm not able to power up the engine:
>
>
> [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>

Please notice that in global maintenance mode nothing will try to start the
engine VM for you.
I assume you tried to exit global maintenance mode or at least you tried to
manually start it with hosted-engine --vm-start, right?


>
>
>
> --== Host ovirt1.localdomain (id: 1) status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : ovirt1.localdomain
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 45e6772b
> local_conf_timestamp               : 288
> Host timestamp                     : 287
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=287 (Thu Mar  7 15:34:06 2019)
>         host-id=1
>         score=3400
>         vm_conf_refresh_time=288 (Thu Mar  7 15:34:07 2019)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> --== Host ovirt2.localdomain (id: 2) status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : ovirt2.localdomain
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 2e9a0444
> local_conf_timestamp               : 3886
> Host timestamp                     : 3885
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=3885 (Thu Mar  7 15:34:05 2019)
>         host-id=2
>         score=3400
>         vm_conf_refresh_time=3886 (Thu Mar  7 15:34:06 2019)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
> [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start
> Command VM.getStats with args {'vmID':
> '8474ae07-f172-4a20-b516-375c73903df7'} failed:
> (code=1, message=Virtual machine does not exist: {'vmId':
> u'8474ae07-f172-4a20-b516-375c73903df7'})
> [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start
> VM exists and is down, cleaning up and restarting
>
> [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
>
>
> --== Host ovirt1.localdomain (id: 1) status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : ovirt1.localdomain
> Host ID                            : 1
> Engine status                      : {"reason": "bad vm status", "health":
> "bad", "vm": "down", "detail": "Down"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 6b086b7c
> local_conf_timestamp               : 328
> Host timestamp                     : 327
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=327 (Thu Mar  7 15:34:46 2019)
>         host-id=1
>         score=3400
>         vm_conf_refresh_time=328 (Thu Mar  7 15:34:47 2019)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> --== Host ovirt2.localdomain (id: 2) status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : ovirt2.localdomain
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : c5890e9c
> local_conf_timestamp               : 3926
> Host timestamp                     : 3925
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=3925 (Thu Mar  7 15:34:45 2019)
>         host-id=2
>         score=3400
>         vm_conf_refresh_time=3926 (Thu Mar  7 15:34:45 2019)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
> [root@ovirt1 ovirt-hosted-engine-ha]# virsh list --all
>  Id    Name                           State
> ----------------------------------------------------
>  -     HostedEngine                   shut off
>
> I am really puzzled why both volumes are wiped out .
>

This is really scaring: can you please double check gluster logs for
warning and errors?



>
>
> Best Regards,
> Strahil Nikolov
>
>
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/UFAN3Y6OGZHNYVRQ44LZTCUCDZB5Q5TM/

[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent

Reply via email to