Re: [ovirt-users] HostedEngine VM not visible, but running

Martin Sivak Fri, 30 Jun 2017 04:13:04 -0700

Hi,

> Just to clarify: you mean the host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
> correct?


Exactly.

Put the cluster to global maintenance first. Or kill all agents (has
the same effect).

Martin

On Fri, Jun 30, 2017 at 12:47 PM, cmc <[email protected]> wrote:
> Just to clarify: you mean the host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
> correct?
>
> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <[email protected]> wrote:
>> Hi,
>>
>> cleaning metadata won't help in this case. Try transferring the
>> spm_ids you got from the engine to the proper hosted engine hosts so
>> the hosted engine ids match the spm_ids. Then restart all hosted
>> engine services. I would actually recommend restarting all hosts after
>> this change, but I have no idea how many VMs you have running.
>>
>> Martin
>>
>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <[email protected]> wrote:
>>> Tried running a 'hosted-engine --clean-metadata" as per
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>>> ovirt-ha-agent was not running anyway, but it fails with the following
>>> error:
>>>
>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>>> to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>>> call last):
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 191, in _run_agent
>>>     return action(he)
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 67, in action_clean
>>>     return he.clean(options.force_cleanup)
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 345, in clean
>>>     self._initialize_domain_monitor()
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 823, in _initialize_domain_monitor
>>>     raise Exception(msg)
>>> Exception: Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt 
>>> '0'
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>>> occurred, giving up. Please review the log and consider filing a bug.
>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>
>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <[email protected]> wrote:
>>>> Actually, it looks like sanlock problems:
>>>>
>>>>    "SanlockInitializationError: Failed to initialize sanlock, the
>>>> number of errors has exceeded the limit"
>>>>
>>>>
>>>>
>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <[email protected]> wrote:
>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>>>>> error:
>>>>>
>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>> ERROR Failed to start monitoring domain
>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>> during domain acquisition
>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>> ERROR Shutting down the agent because of 3 failures in a row!
>>>>>
>>>>> What could cause these timeouts? Some other service not running?
>>>>>
>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote:
>>>>>> Both services are up on all three hosts. The broke logs just report:
>>>>>>
>>>>>> Thread-6549::INFO::2017-06-29
>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>>>> Connection established
>>>>>> Thread-6549::INFO::2017-06-29
>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>>>> Connection closed
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Cam
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>>>>>> are restarted and up. The error says the agent can't talk to the
>>>>>>> broker. Is there anything in the broker.log?
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Martin Sivak
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote:
>>>>>>>> I've restarted those two services across all hosts, have taken the
>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>>>>>>> Hosted Engine over to another host, it reports that all three hosts
>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>>>>>>> host'.
>>>>>>>>
>>>>>>>> On the host that the Hosted Engine is currently on it reports in the 
>>>>>>>> agent.log:
>>>>>>>>
>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>>>>>>> Connection closed: Connection closed
>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>>>>>>> getting service path: Connection closed
>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>>>>>>> call last):
>>>>>>>>                                                     File
>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>> line 191, in _run_agent
>>>>>>>>                                                       return action(he)
>>>>>>>>                                                     File
>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>> line 64, in action_proper
>>>>>>>>                                                       return
>>>>>>>> he.start_monitoring()
>>>>>>>>                                                     File
>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 411, in start_monitoring
>>>>>>>>                                                       
>>>>>>>> self._initialize_sanlock()
>>>>>>>>                                                     File
>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>> line 691, in _initialize_sanlock
>>>>>>>>
>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>>>>>>                                                     File
>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>>> line 162, in get_service_path
>>>>>>>>                                                       .format(str(e)))
>>>>>>>>                                                   RequestError: Failed
>>>>>>>> to get service path: Connection closed
>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>>>>>>>
>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker 
>>>>>>>>> services.
>>>>>>>>>
>>>>>>>>> The scheduling message just means that the host has score 0 or is not
>>>>>>>>> reporting score at all.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote:
>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the
>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not
>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..'
>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>>>>>>>>> engine' option, which suggests that something needs restarting. 
>>>>>>>>>> Should
>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the
>>>>>>>>>> change in host_id?
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>>
>>>>>>>>>> Cam
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a 
>>>>>>>>>>> way
>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers would 
>>>>>>>>>>> be
>>>>>>>>>>> the best.
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to 
>>>>>>>>>>>> fix
>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery path, 
>>>>>>>>>>>> I
>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, 
>>>>>>>>>>>> would
>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then
>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another
>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the
>>>>>>>>>>>> older backup from the bare metal engine that I originally used, or 
>>>>>>>>>>>> use
>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>>>>>>>>>>>> added since switching to Hosted Engine.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I have very little time left to get this working 
>>>>>>>>>>>> before
>>>>>>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>>>>>>
>>>>>>>>>>>> Here are some log snippets from the cluster that are current
>>>>>>>>>>>>
>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) 
>>>>>>>>>>>> [storage.SANLock]
>>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f 
>>>>>>>>>>>> (id:
>>>>>>>>>>>> 3) (clusterlock:282)
>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) 
>>>>>>>>>>>> [storage.Monitor]
>>>>>>>>>>>> Error acquiring host id 3 for domain
>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in 
>>>>>>>>>>>> _acquireHostId
>>>>>>>>>>>>     self.domain.acquireHostId(self.hostId, async=True)
>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
>>>>>>>>>>>>     self._manifest.acquireHostId(hostId, async)
>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
>>>>>>>>>>>>     self._domainLock.acquireHostId(hostId, async)
>>>>>>>>>>>>   File 
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>>>>>>> line 297, in acquireHostId
>>>>>>>>>>>>     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 
>>>>>>>>>>>> 'Sanlock
>>>>>>>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>>>>>>>
>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>>>>>>>>>>
>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>   File 
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>   File 
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>>
>>>>>>>>>>>> From sanlock.log:
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>> conflicts with name of list1 s5
>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>
>>>>>>>>>>>> From the two other hosts:
>>>>>>>>>>>>
>>>>>>>>>>>> host 2:
>>>>>>>>>>>>
>>>>>>>>>>>> vdsm.log
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) 
>>>>>>>>>>>> [jsonrpc.JsonRpcServer]
>>>>>>>>>>>> Internal server error (__init__:570)
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", 
>>>>>>>>>>>> line
>>>>>>>>>>>> 565, in _handle_request
>>>>>>>>>>>>     res = method(**params)
>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>>>>>>>> 202, in _dynamicMethod
>>>>>>>>>>>>     result = fn(*methodArgs)
>>>>>>>>>>>>   File "/usr/share/vdsm/API.py", line 1454, in 
>>>>>>>>>>>> getAllVmIoTunePolicies
>>>>>>>>>>>>     io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>>>>>>>>>>>>   File "/usr/share/vdsm/clientIF.py", line 448, in 
>>>>>>>>>>>> getAllVmIoTunePolicies
>>>>>>>>>>>>     'current_values': v.getIoTune()}
>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
>>>>>>>>>>>>     result = self.getIoTuneResponse()
>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2816, in 
>>>>>>>>>>>> getIoTuneResponse
>>>>>>>>>>>>     res = self._dom.blockIoTune(
>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", 
>>>>>>>>>>>> line
>>>>>>>>>>>> 47, in __getattr__
>>>>>>>>>>>>     % self.vmid)
>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was 
>>>>>>>>>>>> not
>>>>>>>>>>>> started yet or was shut down
>>>>>>>>>>>>
>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>>>>>>
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>> OVF_STORE volume path:
>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>> Got vm.conf from OVF_STORE
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 
>>>>>>>>>>>> 2017
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>>>>>>
>>>>>>>>>>>> /var/log/messages:
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a 
>>>>>>>>>>>> partition!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> host 1:
>>>>>>>>>>>>
>>>>>>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>>>>>>
>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>>>>>>
>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>>>>>>
>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>   File 
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>   File 
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>>>>>>> VDSM domain monitor status: PENDING
>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>>>>>>> Failed to stop monitoring domain
>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>>>>>>> Agent shutting down
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote:
>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The other 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> a different host_id.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted 
>>>>>>>>>>>>> Engine on
>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it
>>>>>>>>>>>>> installed). The second one I deployed said it was able to host the
>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the 
>>>>>>>>>>>>> host
>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would
>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. 
>>>>>>>>>>>>> And
>>>>>>>>>>>>> now the host that said it was able to host the engine says
>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move
>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The summary is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, 
>>>>>>>>>>>>> reinstalled
>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
>>>>>>>>>>>>> Hngine, host_id of '2' in 
>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>>>>>>
>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted 
>>>>>>>>>>>>> Engine
>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after 
>>>>>>>>>>>>> migration
>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it 
>>>>>>>>>>>>> reports:
>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in
>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in 
>>>>>>>>>>>>> sanlock.log
>>>>>>>>>>>>>
>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal engine
>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, 
>>>>>>>>>>>>> without
>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM 
>>>>>>>>>>>>> after
>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine).
>>>>>>>>>>>>> Sanlock log shows conflicts
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will look through all the logs for any other errors. Please let 
>>>>>>>>>>>>> me
>>>>>>>>>>>>> know if you need any logs or other clarification/information.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Campbell
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <[email protected]> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> can you please check the contents of
>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one 
>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>> right now) and search for host-id?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is 
>>>>>>>>>>>>>> a bug somewhere.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in 
>>>>>>>>>>>>>>> /var/log/sanlock:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire 
>>>>>>>>>>>>>>> host_id 1
>>>>>>>>>>>>>>> busy1 1 2 1042692 
>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace 
>>>>>>>>>>>>>>> fail result -262
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> C
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment 
>>>>>>>>>>>>>>>> completed
>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted 
>>>>>>>>>>>>>>>> Engine
>>>>>>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on 
>>>>>>>>>>>>>>>> the host
>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it 
>>>>>>>>>>>>>>>> starts,
>>>>>>>>>>>>>>>> reports:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted 
>>>>>>>>>>>>>>>> Engine
>>>>>>>>>>>>>>>> High Availability Communications Broker...
>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: 
>>>>>>>>>>>>>>>> ovirt-ha-broker
>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker 
>>>>>>>>>>>>>>>> ERROR
>>>>>>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>>>>>>                                                   Traceback 
>>>>>>>>>>>>>>>> (most
>>>>>>>>>>>>>>>> recent call last):
>>>>>>>>>>>>>>>>                                                     File
>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type
>>>>>>>>>>>>>>>>                                                       f =
>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>>>>>>>>>>>>>>                                                   OSError: 
>>>>>>>>>>>>>>>> [Errno 2]
>>>>>>>>>>>>>>>> No such file or directory:
>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it 
>>>>>>>>>>>>>>>> fine. The
>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the 
>>>>>>>>>>>>>>>> VM vs the
>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is 
>>>>>>>>>>>>>>>> vdsm:qemu. Is
>>>>>>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak 
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the 
>>>>>>>>>>>>>>>>>> migration from
>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the 
>>>>>>>>>>>>>>>>> services
>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select 
>>>>>>>>>>>>>>>>> Hosted
>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display 
>>>>>>>>>>>>>>>>>> protocols
>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I 
>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 
>>>>>>>>>>>>>>>>>> 3.6
>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM 
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently on 
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to 
>>>>>>>>>>>>>>>>>> Maintenance mode.
>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine 
>>>>>>>>>>>>>>>>>> VM.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine 
>>>>>>>>>>>>>>>>>> status:
>>>>>>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It 
>>>>>>>>>>>>>>>>>> was set to
>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no
>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two 
>>>>>>>>>>>>>>>>>> hosts.
>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the 
>>>>>>>>>>>>>>>>>> migration from
>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So in my 
>>>>>>>>>>>>>>>>>>> /usr/share/ovirt-engine/conf/osinfo-defaults.properties on 
>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = 
>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = 
>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 
>>>>>>>>>>>>>>>>>>> should not apply
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek 
>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak 
>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check 
>>>>>>>>>>>>>>>>>>>>> > that causes
>>>>>>>>>>>>>>>>>>>>> > the following error?
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation 
>>>>>>>>>>>>>>>>>>>>> >>>>> of action
>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from 
>>>>>>>>>>>>>>>>>>>>> older releases on
>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to 
>>>>>>>>>>>>>>>>>>>>> vga for non-SPICE
>>>>>>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc 
>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the 
>>>>>>>>>>>>>>>>>>>>> >>> database backup from
>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to 
>>>>>>>>>>>>>>>>>>>>> >>> just try again
>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending 
>>>>>>>>>>>>>>>>>>>>> >>> value there
>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance.
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as 
>>>>>>>>>>>>>>>>>>>>> >> re-running the
>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to 
>>>>>>>>>>>>>>>>>>>>> >> re-initialise
>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the 
>>>>>>>>>>>>>>>>>>>>> >> offending value(s)
>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a 
>>>>>>>>>>>>>>>>>>>>> >> config file
>>>>>>>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation 
>>>>>>>>>>>>>>>>>>>>> >>>>> of action
>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up 
>>>>>>>>>>>>>>>>>>>>> >>>>> correctly  and
>>>>>>>>>>>>>>>>>>>>> >>>>> further
>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt 
>>>>>>>>>>>>>>>>>>>>> >>>>> in a stable state.
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a 
>>>>>>>>>>>>>>>>>>>>> >>>>> previous/initial
>>>>>>>>>>>>>>>>>>>>> >>>>> state ?
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for 
>>>>>>>>>>>>>>>>>>>>> >>>> now. This was a
>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't 
>>>>>>>>>>>>>>>>>>>>> >>>> report any error
>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my 
>>>>>>>>>>>>>>>>>>>>> >>>> first attempts at
>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a 
>>>>>>>>>>>>>>>>>>>>> >>>> proxy issue) but
>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to 
>>>>>>>>>>>>>>>>>>>>> >>>> get the Hosted
>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding 
>>>>>>>>>>>>>>>>>>>>> >>>> the entire cluster
>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM 
>>>>>>>>>>>>>>>>>>>>> >>>>>> has no log on any
>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if 
>>>>>>>>>>>>>>>>>>>>> >>>>>> I need to put the
>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I 
>>>>>>>>>>>>>>>>>>>>> >>>>>> created it on
>>>>>>>>>>>>>>>>>>>>> >>>>>> (which
>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any 
>>>>>>>>>>>>>>>>>>>>> >>>>>> reason, it won't get
>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able 
>>>>>>>>>>>>>>>>>>>>> >>>>>> to manage the
>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position 
>>>>>>>>>>>>>>>>>>>>> >>>>>> to be in.
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> engine:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> of action
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> to object
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>,
>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}'
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> host, and a
>>>>>>>>>>>>>>>>>>>>> >>>>>>> different
>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> related.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>>>>>>>>>>>>>> >>>>>>> host
>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> volume, falling back
>>>>>>>>>>>>>>>>>>>>> >>>>>>> to
>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> bugzilla, but they were
>>>>>>>>>>>>>>>>>>>>> >>>>>>> for
>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> resolved).
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> hosts, for which I
>>>>>>>>>>>>>>>>>>>>> >>>>>>> will
>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> an upgrade. I
>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess
>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
>>>>>>>>>>>>>>>>>>>>> >>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you should have
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> nodes should
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> installed across all
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> it properly?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> but am unsure how
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> down, try starting
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> engine storage domain
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> hosted engine vm.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> displayed in the
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine VM for that
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> entirely new host and
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> completed
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> regarding a master
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> two existing master
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and one ISO
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> are failing?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it exited with
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage             : True
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date                  : False
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname                           :
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID                            : 1
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status                      : unknown 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stale-data
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score                              : 0
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped                            : True
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance                  : False
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32                              : 0217f07b
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp               : 2911
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp                     : 2897
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_parse_version=1
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_feature_version=1
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        timestamp=2897 (Thu Jun 15 16:22:54 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 2017)
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        host-id=1
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        score=0
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        vm_conf_refresh_time=2911 (Thu Jun 15 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 16:23:08 2017)
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        conf_on_shared_storage=True
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        maintenance=False
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        state=AgentStopped
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        stopped=True
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it is not HA due
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster, btw?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> a hosted engine.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted engine
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returns nothing (exit
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> tried starting
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned:
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> into it
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands 
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> fail, and why it
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>> >>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>>>>>>>>>> > Users mailing list
>>>>>>>>>>>>>>>>>>>>> > [email protected]
>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] HostedEngine VM not visible, but running

Reply via email to