Change the ids so they are distinct. I need to check if there is a way to read the SPM ids from the engine as using the same numbers would be the best.
Martin On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote: > Is there any way of recovering from this situation? I'd prefer to fix > the issue rather than re-deploy, but if there is no recovery path, I > could perhaps try re-deploying the hosted engine. In which case, would > the best option be to take a backup of the Hosted Engine, and then > shut it down, re-initialise the SAN partition (or use another > partition) and retry the deployment? Would it be better to use the > older backup from the bare metal engine that I originally used, or use > a backup from the Hosted Engine? I'm not sure if any VMs have been > added since switching to Hosted Engine. > > Unfortunately I have very little time left to get this working before > I have to hand it over for eval (by end of Friday). > > Here are some log snippets from the cluster that are current > > In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: > > 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] > Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: > 3) (clusterlock:282) > 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] > Error acquiring host id 3 for domain > 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId > self.domain.acquireHostId(self.hostId, async=True) > File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId > self._manifest.acquireHostId(hostId, async) > File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId > self._domainLock.acquireHostId(hostId, async) > File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", > line 297, in acquireHostId > raise se.AcquireHostIdFailure(self._sdUUID, e) > AcquireHostIdFailure: Cannot acquire host id: > ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock > lockspace add failure', 'Invalid argument')) > > From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: > > MainThread::ERROR::2017-06-19 > 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) > Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-19 > 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Error while monitoring engine: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-19 > 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Unexpected error > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 443, in start_monitoring > self._initialize_domain_monitor() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 823, in _initialize_domain_monitor > raise Exception(msg) > Exception: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::ERROR::2017-06-19 > 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Shutting down the agent because of 3 failures in a row! > > From sanlock.log: > > 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace > 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 > conflicts with name of list1 s5 > 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 > > From the two other hosts: > > host 2: > > vdsm.log > > 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] > Internal server error (__init__:570) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line > 565, in _handle_request > res = method(**params) > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line > 202, in _dynamicMethod > result = fn(*methodArgs) > File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies > io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() > File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies > 'current_values': v.getIoTune()} > File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune > result = self.getIoTuneResponse() > File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse > res = self._dom.blockIoTune( > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line > 47, in __getattr__ > % self.vmid) > NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not > started yet or was shut down > > /var/log/ovirt-hosted-engine-ha/agent.log > > MainThread::INFO::2017-06-29 > 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, > volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 > MainThread::INFO::2017-06-29 > 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > Extracting Engine VM OVF from the OVF_STORE > MainThread::INFO::2017-06-29 > 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) > OVF_STORE volume path: > /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 > MainThread::INFO::2017-06-29 > 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Found an OVF for HE VM, trying to convert > MainThread::INFO::2017-06-29 > 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) > Got vm.conf from OVF_STORE > MainThread::INFO::2017-06-29 > 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) > Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 > MainThread::INFO::2017-06-29 > 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2017-06-29 > 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) > Reloading vm.conf from the shared storage domain > > /var/log/messages: > > Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a partition! > > > host 1: > > /var/log/messages also in sanlock.log > > Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 > 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 > 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 > Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 > 678326 [24159]: s4531 add_lockspace fail result -262 > > /var/log/ovirt-hosted-engine-ha/agent.log: > > MainThread::ERROR::2017-06-27 > 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) > Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-27 > 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Error while monitoring engine: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::WARNING::2017-06-27 > 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Unexpected error > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 443, in start_monitoring > self._initialize_domain_monitor() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 823, in _initialize_domain_monitor > raise Exception(msg) > Exception: Failed to start monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout > during domain acquisition > MainThread::ERROR::2017-06-27 > 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Shutting down the agent because of 3 failures in a row! > MainThread::INFO::2017-06-27 > 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) > VDSM domain monitor status: PENDING > MainThread::INFO::2017-06-27 > 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) > Failed to stop monitoring domain > (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is > member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' > MainThread::INFO::2017-06-27 > 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) > Agent shutting down > > > Thanks for any help, > > > Cam > > > On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote: >> Hi Martin, >> >> yes, on two of the machines they have the same host_id. The other has >> a different host_id. >> >> To update since yesterday: I reinstalled and deployed Hosted Engine on >> the other host (so all three hosts in the cluster now have it >> installed). The second one I deployed said it was able to host the >> engine (unlike the first I reinstalled), so I tried putting the host >> with the Hosted Engine on it into maintenance to see if it would >> migrate over. It managed to move all hosts but the Hosted Engine. And >> now the host that said it was able to host the engine says >> 'unavailable due to HA score'. The host that it was trying to move >> from is now in 'preparing for maintenance' for the last 12 hours. >> >> The summary is: >> >> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >> 'add_lockspace' fails in sanlock.log >> >> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >> saying that it was able to host the Hosted Engine, but after migration >> was attempted when putting kvm-ldn-03 into maintenance, it reports: >> 'unavailable due to HA score'. It has a host_id of '1' in >> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >> >> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >> not part of the original cluster. I restored the bare-metal engine >> backup in the Hosted Engine on this host when deploying it, without >> error. It currently has the Hosted Engine on it (as the only VM after >> I put that host into maintenance to test the HA of Hosted Engine). >> Sanlock log shows conflicts >> >> I will look through all the logs for any other errors. Please let me >> know if you need any logs or other clarification/information. >> >> Thanks, >> >> Campbell >> >> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <[email protected]> wrote: >>> Hi, >>> >>> can you please check the contents of >>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>> right now) and search for host-id? >>> >>> Make sure the IDs are different. If they are not, then there is a bug >>> somewhere. >>> >>> Martin >>> >>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote: >>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>> >>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1 >>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result >>>> -262 >>>> >>>> The sanlock service is running. Why would this occur? >>>> >>>> Thanks, >>>> >>>> C >>>> >>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> wrote: >>>>> Hi Martin, >>>>> >>>>> Thanks for the reply. I have done this, and the deployment completed >>>>> without error. However, it still will not allow the Hosted Engine >>>>> migrate to another host. The >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host >>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>> reports: >>>>> >>>>> --------------------8<------------------- >>>>> >>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>> High Availability Communications Broker... >>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>> Failed to read metadata from >>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>> Traceback (most >>>>> recent call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>> line 129, in get_raw_stats_for_service_type >>>>> f = >>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>> OSError: [Errno 2] >>>>> No such file or directory: >>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>> >>>>> --------------------8<------------------- >>>>> >>>>> I checked the path, and it exists. I can run 'less -f' on it fine. The >>>>> perms are slightly different on the host that is running the VM vs the >>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is >>>>> this a san locking issue? >>>>> >>>>> Thanks for any help, >>>>> >>>>> Cam >>>>> >>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <[email protected]> wrote: >>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>> bare-metal to Hosted VM >>>>>> >>>>>> The hosted engine will only migrate to hosts that have the services >>>>>> running. Please put one other host to maintenance and select Hosted >>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>> >>>>>> Best regards >>>>>> >>>>>> Martin Sivak >>>>>> >>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> wrote: >>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols >>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>> another host. When I try putting the host it is currently on into >>>>>>> maintenance, it reports: >>>>>>> >>>>>>> Error while executing action: Cannot switch the Host(s) to Maintenance >>>>>>> mode. >>>>>>> There are no available hosts capable of running the engine VM. >>>>>>> >>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>> unknown stale-data'. >>>>>>> >>>>>>> The ovirt-ha-broker service is only running on one host. It was set to >>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts. >>>>>>> Should it be? It was not in the instructions for the migration from >>>>>>> bare-metal to Hosted VM >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> wrote: >>>>>>>> Hi Tomas, >>>>>>>> >>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my >>>>>>>> engine VM, I have: >>>>>>>> >>>>>>>> os.other.devices.display.protocols.value = >>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>> os.other.devices.display.protocols.value.3.6 = >>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>> >>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should not >>>>>>>> apply >>>>>>>> >>>>>>>> Is there somewhere else I should be looking? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <[email protected]> wrote: >>>>>>>>>> > >>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that causes >>>>>>>>>> > the following error? >>>>>>>>>> > >>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older >>>>>>>>>> releases on >>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga for >>>>>>>>>> non-SPICE >>>>>>>>>> VMs >>>>>>>>> >>>>>>>>> >>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > Thanks. >>>>>>>>>> > >>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <[email protected]> wrote: >>>>>>>>>> >> Hi Martin, >>>>>>>>>> >> >>>>>>>>>> >>> >>>>>>>>>> >>> just as a random comment, do you still have the database backup >>>>>>>>>> >>> from >>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just try >>>>>>>>>> >>> again >>>>>>>>>> >>> using it. Or in the worst case.. update the offending value there >>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>> >> >>>>>>>>>> >> I still have the backup. I'd rather do the latter, as re-running >>>>>>>>>> >> the >>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to >>>>>>>>>> >> re-initialise >>>>>>>>>> >> the FC storage each time). Do you know what the offending value(s) >>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>> >> somewhere? >>>>>>>>>> >> >>>>>>>>>> >> Cheers, >>>>>>>>>> >> >>>>>>>>>> >> Cam >>>>>>>>>> >> >>>>>>>>>> >>> Regards >>>>>>>>>> >>> >>>>>>>>>> >>> Martin Sivak >>>>>>>>>> >>> >>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <[email protected]> wrote: >>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>> >>>> >>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>> >>>>> WARN [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up correctly and >>>>>>>>>> >>>>> further >>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a stable >>>>>>>>>> >>>>> state. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> As for now, are you trying to revert back to a previous/initial >>>>>>>>>> >>>>> state ? >>>>>>>>>> >>>> >>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. This >>>>>>>>>> >>>> was a >>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report any >>>>>>>>>> >>>> error >>>>>>>>>> >>>> during the migration. I'd had some problems on my first >>>>>>>>>> >>>> attempts at >>>>>>>>>> >>>> this migration, whereby it never completed (due to a proxy >>>>>>>>>> >>>> issue) but >>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get the >>>>>>>>>> >>>> Hosted >>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the entire >>>>>>>>>> >>>> cluster >>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Regards, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Cam >>>>>>>>>> >>>> >>>>>>>>>> >>>>> Regards, >>>>>>>>>> >>>>> Yanir >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <[email protected]> wrote: >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has no log >>>>>>>>>> >>>>>> on any >>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I need to >>>>>>>>>> >>>>>> put the >>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I created it >>>>>>>>>> >>>>>> on >>>>>>>>>> >>>>>> (which >>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, it >>>>>>>>>> >>>>>> won't get >>>>>>>>>> >>>>>> migrated to another host, and I will not be able to manage the >>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be in. >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Thanks, >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> Cam >>>>>>>>>> >>>>>> >>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc <[email protected]> >>>>>>>>>> >>>>>> wrote: >>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same cluster. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action >>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object >>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>, >>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, >>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>> >>>>>>> HostedEngine>]'}' >>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the >>>>>>>>>> >>>>>>> Hosted >>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, and a >>>>>>>>>> >>>>>>> different >>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on >>>>>>>>>> >>>>>>> the >>>>>>>>>> >>>>>>> host >>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, falling >>>>>>>>>> >>>>>>> back >>>>>>>>>> >>>>>>> to >>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, but >>>>>>>>>> >>>>>>> they were >>>>>>>>>> >>>>>>> for >>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> I will install that package on the other two hosts, for >>>>>>>>>> >>>>>>> which I >>>>>>>>>> >>>>>>> will >>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an upgrade. I >>>>>>>>>> >>>>>>> guess >>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> Campbell >>>>>>>>>> >>>>>>> >>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak >>>>>>>>>> >>>>>>> <[email protected]> >>>>>>>>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you should >>>>>>>>>> >>>>>>>> have >>>>>>>>>> >>>>>>>> more >>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes should >>>>>>>>>> >>>>>>>> belong to >>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc <[email protected]> >>>>>>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed across all >>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it properly? >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc <[email protected]> >>>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but am >>>>>>>>>> >>>>>>>>>> unsure how >>>>>>>>>> >>>>>>>>>> they >>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>> >>>>>>>>>> <[email protected]> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, try >>>>>>>>>> >>>>>>>>>>> starting >>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine storage >>>>>>>>>> >>>>>>>>>>> domain >>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted engine >>>>>>>>>> >>>>>>>>>>> vm. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc <[email protected]> >>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and displayed >>>>>>>>>> >>>>>>>>>>>>> in the >>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a migration of a >>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine VM for >>>>>>>>>> >>>>>>>>>>>> that >>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely new host >>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these instructions: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it completed >>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a master >>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two existing >>>>>>>>>> >>>>>>>>>>>> master >>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and one ISO >>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are >>>>>>>>>> >>>>>>>>>>>>> failing? >>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it exited >>>>>>>>>> >>>>>>>>>>>> with >>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>> >>>>>>>>>>>> Engine status : unknown stale-data >>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 >>>>>>>>>> >>>>>>>>>>>> 2017) >>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it is not >>>>>>>>>> >>>>>>>>>>>> HA due >>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the >>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm >>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc >>>>>>>>>> >>>>>>>>>>>>> <[email protected]> >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a hosted >>>>>>>>>> >>>>>>>>>>>>>> engine. >>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the hosted >>>>>>>>>> >>>>>>>>>>>>>> engine >>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns nothing >>>>>>>>>> >>>>>>>>>>>>>> (exit >>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I tried >>>>>>>>>> >>>>>>>>>>>>>> starting >>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, and >>>>>>>>>> >>>>>>>>>>>>>> why it >>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>> >>>>>> [email protected] >>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > Users mailing list >>>>>>>>>> > [email protected] >>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

