Re: [ovirt-users] HostedEngine VM not visible, but running

cmc Fri, 30 Jun 2017 07:20:02 -0700

Help! I put the cluster into global maintenance, then powered off and
then on all of the nodes I have powered off and powered on all the
nodes. I have taken it out of global maintenance. No VM has started,
including the hosted engine. This is very bad. I am going to look
through logs to see why nothing has started. Help greatly appreciated.


Thanks,

Cam

On Fri, Jun 30, 2017 at 1:00 PM, cmc <[email protected]> wrote:
> So I can run from any node: hosted-engine --set-maintenance
> --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This
> shouldn't affect the running of any VMs, correct? Sorry for the
> questions, just want to do it correctly and not make assumptions :)
>
> Cheers,
>
> C
>
> On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <[email protected]> wrote:
>> Hi,
>>
>>> Just to clarify: you mean the host_id in
>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>> correct?
>>
>> Exactly.
>>
>> Put the cluster to global maintenance first. Or kill all agents (has
>> the same effect).
>>
>> Martin
>>
>> On Fri, Jun 30, 2017 at 12:47 PM, cmc <[email protected]> wrote:
>>> Just to clarify: you mean the host_id in
>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>> correct?
>>>
>>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> cleaning metadata won't help in this case. Try transferring the
>>>> spm_ids you got from the engine to the proper hosted engine hosts so
>>>> the hosted engine ids match the spm_ids. Then restart all hosted
>>>> engine services. I would actually recommend restarting all hosts after
>>>> this change, but I have no idea how many VMs you have running.
>>>>
>>>> Martin
>>>>
>>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <[email protected]> wrote:
>>>>> Tried running a 'hosted-engine --clean-metadata" as per
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>>>>> ovirt-ha-agent was not running anyway, but it fails with the following
>>>>> error:
>>>>>
>>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>>>>> to start monitoring domain
>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>> during domain acquisition
>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>>>>> call last):
>>>>>   File 
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 191, in _run_agent
>>>>>     return action(he)
>>>>>   File 
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>> line 67, in action_clean
>>>>>     return he.clean(options.force_cleanup)
>>>>>   File 
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 345, in clean
>>>>>     self._initialize_domain_monitor()
>>>>>   File 
>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>> line 823, in _initialize_domain_monitor
>>>>>     raise Exception(msg)
>>>>> Exception: Failed to start monitoring domain
>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>> during domain acquisition
>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>>>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, 
>>>>> attempt '0'
>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>>>>> occurred, giving up. Please review the log and consider filing a bug.
>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>>>
>>>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <[email protected]> wrote:
>>>>>> Actually, it looks like sanlock problems:
>>>>>>
>>>>>>    "SanlockInitializationError: Failed to initialize sanlock, the
>>>>>> number of errors has exceeded the limit"
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <[email protected]> wrote:
>>>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>>>>>>> error:
>>>>>>>
>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>>>> ERROR Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>>>>>> ERROR Shutting down the agent because of 3 failures in a row!
>>>>>>>
>>>>>>> What could cause these timeouts? Some other service not running?
>>>>>>>
>>>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote:
>>>>>>>> Both services are up on all three hosts. The broke logs just report:
>>>>>>>>
>>>>>>>> Thread-6549::INFO::2017-06-29
>>>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>>>>>> Connection established
>>>>>>>> Thread-6549::INFO::2017-06-29
>>>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>>>>>> Connection closed
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Cam
>>>>>>>>
>>>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>>>>>>>> are restarted and up. The error says the agent can't talk to the
>>>>>>>>> broker. Is there anything in the broker.log?
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> Martin Sivak
>>>>>>>>>
>>>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote:
>>>>>>>>>> I've restarted those two services across all hosts, have taken the
>>>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>>>>>>>>> Hosted Engine over to another host, it reports that all three hosts
>>>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>>>>>>>>> host'.
>>>>>>>>>>
>>>>>>>>>> On the host that the Hosted Engine is currently on it reports in the 
>>>>>>>>>> agent.log:
>>>>>>>>>>
>>>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>>>>>>>>> Connection closed: Connection closed
>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>>>>>>>>> getting service path: Connection closed
>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>>>>>>>>> call last):
>>>>>>>>>>                                                     File
>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>>>> line 191, in _run_agent
>>>>>>>>>>                                                       return 
>>>>>>>>>> action(he)
>>>>>>>>>>                                                     File
>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>>>> line 64, in action_proper
>>>>>>>>>>                                                       return
>>>>>>>>>> he.start_monitoring()
>>>>>>>>>>                                                     File
>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>> line 411, in start_monitoring
>>>>>>>>>>                                                       
>>>>>>>>>> self._initialize_sanlock()
>>>>>>>>>>                                                     File
>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>> line 691, in _initialize_sanlock
>>>>>>>>>>
>>>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>>>>>>>>                                                     File
>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>>>>> line 162, in get_service_path
>>>>>>>>>>                                                       
>>>>>>>>>> .format(str(e)))
>>>>>>>>>>                                                   RequestError: 
>>>>>>>>>> Failed
>>>>>>>>>> to get service path: Connection closed
>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart 
>>>>>>>>>> agent
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker 
>>>>>>>>>>> services.
>>>>>>>>>>>
>>>>>>>>>>> The scheduling message just means that the host has score 0 or is 
>>>>>>>>>>> not
>>>>>>>>>>> reporting score at all.
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the
>>>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not
>>>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine 
>>>>>>>>>>>> host..'
>>>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>>>>>>>>>>> engine' option, which suggests that something needs restarting. 
>>>>>>>>>>>> Should
>>>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the
>>>>>>>>>>>> change in host_id?
>>>>>>>>>>>>
>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Cam
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is 
>>>>>>>>>>>>> a way
>>>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers 
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> the best.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer 
>>>>>>>>>>>>>> to fix
>>>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery 
>>>>>>>>>>>>>> path, I
>>>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, 
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and 
>>>>>>>>>>>>>> then
>>>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another
>>>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> older backup from the bare metal engine that I originally used, 
>>>>>>>>>>>>>> or use
>>>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have 
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>> added since switching to Hosted Engine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately I have very little time left to get this working 
>>>>>>>>>>>>>> before
>>>>>>>>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here are some log snippets from the cluster that are current
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) 
>>>>>>>>>>>>>> [storage.SANLock]
>>>>>>>>>>>>>> Acquiring host id for domain 
>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (id:
>>>>>>>>>>>>>> 3) (clusterlock:282)
>>>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) 
>>>>>>>>>>>>>> [storage.Monitor]
>>>>>>>>>>>>>> Error acquiring host id 3 for domain
>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in 
>>>>>>>>>>>>>> _acquireHostId
>>>>>>>>>>>>>>     self.domain.acquireHostId(self.hostId, async=True)
>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 790, in 
>>>>>>>>>>>>>> acquireHostId
>>>>>>>>>>>>>>     self._manifest.acquireHostId(hostId, async)
>>>>>>>>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 449, in 
>>>>>>>>>>>>>> acquireHostId
>>>>>>>>>>>>>>     self._domainLock.acquireHostId(hostId, async)
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>>>>>>>>> line 297, in acquireHostId
>>>>>>>>>>>>>>     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 
>>>>>>>>>>>>>> 'Sanlock
>>>>>>>>>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From sanlock.log:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>> conflicts with name of list1 s5
>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From the two other hosts:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> host 2:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> vdsm.log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) 
>>>>>>>>>>>>>> [jsonrpc.JsonRpcServer]
>>>>>>>>>>>>>> Internal server error (__init__:570)
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", 
>>>>>>>>>>>>>> line
>>>>>>>>>>>>>> 565, in _handle_request
>>>>>>>>>>>>>>     res = method(**params)
>>>>>>>>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", 
>>>>>>>>>>>>>> line
>>>>>>>>>>>>>> 202, in _dynamicMethod
>>>>>>>>>>>>>>     result = fn(*methodArgs)
>>>>>>>>>>>>>>   File "/usr/share/vdsm/API.py", line 1454, in 
>>>>>>>>>>>>>> getAllVmIoTunePolicies
>>>>>>>>>>>>>>     io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>>>>>>>>>>>>>>   File "/usr/share/vdsm/clientIF.py", line 448, in 
>>>>>>>>>>>>>> getAllVmIoTunePolicies
>>>>>>>>>>>>>>     'current_values': v.getIoTune()}
>>>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
>>>>>>>>>>>>>>     result = self.getIoTuneResponse()
>>>>>>>>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2816, in 
>>>>>>>>>>>>>> getIoTuneResponse
>>>>>>>>>>>>>>     res = self._dom.blockIoTune(
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>>>>>>>>>> 47, in __getattr__
>>>>>>>>>>>>>>     % self.vmid)
>>>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' 
>>>>>>>>>>>>>> was not
>>>>>>>>>>>>>> started yet or was shut down
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>> OVF_STORE volume path:
>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>> Got vm.conf from OVF_STORE
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 
>>>>>>>>>>>>>> 2017
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>>>>>>>>> MainThread::INFO::2017-06-29
>>>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>>>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /var/log/messages:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to 
>>>>>>>>>>>>>> a partition!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> host 1:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 
>>>>>>>>>>>>>> 11:01:02+0100
>>>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 
>>>>>>>>>>>>>> 11:01:03+0100
>>>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>>>>>>>>> Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Unexpected error
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>> line 443, in start_monitoring
>>>>>>>>>>>>>>     self._initialize_domain_monitor()
>>>>>>>>>>>>>>   File 
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>>>>>>>>     raise Exception(msg)
>>>>>>>>>>>>>> Exception: Failed to start monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): 
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>> during domain acquisition
>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>>>>>>>>> VDSM domain monitor status: PENDING
>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>>>>>>>>> Failed to stop monitoring domain
>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
>>>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>>>>>>>>> MainThread::INFO::2017-06-27
>>>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>>>>>>>>> Agent shutting down
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The 
>>>>>>>>>>>>>>> other has
>>>>>>>>>>>>>>> a different host_id.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted 
>>>>>>>>>>>>>>> Engine on
>>>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it
>>>>>>>>>>>>>>> installed). The second one I deployed said it was able to host 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the 
>>>>>>>>>>>>>>> host
>>>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would
>>>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted 
>>>>>>>>>>>>>>> Engine. And
>>>>>>>>>>>>>>> now the host that said it was able to host the engine says
>>>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to 
>>>>>>>>>>>>>>> move
>>>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 
>>>>>>>>>>>>>>> hours.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The summary is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, 
>>>>>>>>>>>>>>> reinstalled
>>>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the 
>>>>>>>>>>>>>>> Hosted
>>>>>>>>>>>>>>> Hngine, host_id of '2' in 
>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted 
>>>>>>>>>>>>>>> Engine
>>>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an 
>>>>>>>>>>>>>>> icon
>>>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after 
>>>>>>>>>>>>>>> migration
>>>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it 
>>>>>>>>>>>>>>> reports:
>>>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in
>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in 
>>>>>>>>>>>>>>> sanlock.log
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, 
>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal 
>>>>>>>>>>>>>>> engine
>>>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, 
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM 
>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted 
>>>>>>>>>>>>>>> Engine).
>>>>>>>>>>>>>>> Sanlock log shows conflicts
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will look through all the logs for any other errors. Please 
>>>>>>>>>>>>>>> let me
>>>>>>>>>>>>>>> know if you need any logs or other clarification/information.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Campbell
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak 
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> can you please check the contents of
>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which 
>>>>>>>>>>>>>>>> one it is
>>>>>>>>>>>>>>>> right now) and search for host-id?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there 
>>>>>>>>>>>>>>>> is a bug somewhere.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in 
>>>>>>>>>>>>>>>>> /var/log/sanlock:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire 
>>>>>>>>>>>>>>>>> host_id 1
>>>>>>>>>>>>>>>>> busy1 1 2 1042692 
>>>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace 
>>>>>>>>>>>>>>>>> fail result -262
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> C
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment 
>>>>>>>>>>>>>>>>>> completed
>>>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted 
>>>>>>>>>>>>>>>>>> Engine
>>>>>>>>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok 
>>>>>>>>>>>>>>>>>> on the host
>>>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it 
>>>>>>>>>>>>>>>>>> starts,
>>>>>>>>>>>>>>>>>> reports:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted 
>>>>>>>>>>>>>>>>>> Engine
>>>>>>>>>>>>>>>>>> High Availability Communications Broker...
>>>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: 
>>>>>>>>>>>>>>>>>> ovirt-ha-broker
>>>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker 
>>>>>>>>>>>>>>>>>> ERROR
>>>>>>>>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>>>>>>>>                                                   Traceback 
>>>>>>>>>>>>>>>>>> (most
>>>>>>>>>>>>>>>>>> recent call last):
>>>>>>>>>>>>>>>>>>                                                     File
>>>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type
>>>>>>>>>>>>>>>>>>                                                       f =
>>>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>>>>>>>>>>>>>>>>                                                   OSError: 
>>>>>>>>>>>>>>>>>> [Errno 2]
>>>>>>>>>>>>>>>>>> No such file or directory:
>>>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it 
>>>>>>>>>>>>>>>>>> fine. The
>>>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the 
>>>>>>>>>>>>>>>>>> VM vs the
>>>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is 
>>>>>>>>>>>>>>>>>> vdsm:qemu. Is
>>>>>>>>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak 
>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the 
>>>>>>>>>>>>>>>>>>>> migration from
>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the 
>>>>>>>>>>>>>>>>>>> services
>>>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and 
>>>>>>>>>>>>>>>>>>> select Hosted
>>>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> I changed the 
>>>>>>>>>>>>>>>>>>>> 'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same 
>>>>>>>>>>>>>>>>>>>> display protocols
>>>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. 
>>>>>>>>>>>>>>>>>>>> I am
>>>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use 
>>>>>>>>>>>>>>>>>>>> the 3.6
>>>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine 
>>>>>>>>>>>>>>>>>>>> VM to
>>>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently 
>>>>>>>>>>>>>>>>>>>> on into
>>>>>>>>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to 
>>>>>>>>>>>>>>>>>>>> Maintenance mode.
>>>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine 
>>>>>>>>>>>>>>>>>>>> VM.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine 
>>>>>>>>>>>>>>>>>>>> status:
>>>>>>>>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. 
>>>>>>>>>>>>>>>>>>>> It was set to
>>>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no
>>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other 
>>>>>>>>>>>>>>>>>>>> two hosts.
>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the 
>>>>>>>>>>>>>>>>>>>> migration from
>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So in my 
>>>>>>>>>>>>>>>>>>>>> /usr/share/ovirt-engine/conf/osinfo-defaults.properties 
>>>>>>>>>>>>>>>>>>>>> on my
>>>>>>>>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = 
>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = 
>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 
>>>>>>>>>>>>>>>>>>>>> should not apply
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek 
>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak 
>>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the 
>>>>>>>>>>>>>>>>>>>>>>> > check that causes
>>>>>>>>>>>>>>>>>>>>>>> > the following error?
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> of action
>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from 
>>>>>>>>>>>>>>>>>>>>>>> older releases on
>>>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus 
>>>>>>>>>>>>>>>>>>>>>>> to vga for non-SPICE
>>>>>>>>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the 
>>>>>>>>>>>>>>>>>>>>>>> >>> database backup from
>>>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible 
>>>>>>>>>>>>>>>>>>>>>>> >>> to just try again
>>>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the 
>>>>>>>>>>>>>>>>>>>>>>> >>> offending value there
>>>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance.
>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, 
>>>>>>>>>>>>>>>>>>>>>>> >> as re-running the
>>>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have 
>>>>>>>>>>>>>>>>>>>>>>> >> to re-initialise
>>>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the 
>>>>>>>>>>>>>>>>>>>>>>> >> offending value(s)
>>>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a 
>>>>>>>>>>>>>>>>>>>>>>> >> config file
>>>>>>>>>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> of action
>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> correctly  and
>>>>>>>>>>>>>>>>>>>>>>> >>>>> further
>>>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> in a stable state.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> previous/initial
>>>>>>>>>>>>>>>>>>>>>>> >>>>> state ?
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state 
>>>>>>>>>>>>>>>>>>>>>>> >>>> for now. This was a
>>>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't 
>>>>>>>>>>>>>>>>>>>>>>> >>>> report any error
>>>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my 
>>>>>>>>>>>>>>>>>>>>>>> >>>> first attempts at
>>>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to 
>>>>>>>>>>>>>>>>>>>>>>> >>>> a proxy issue) but
>>>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to 
>>>>>>>>>>>>>>>>>>>>>>> >>>> get the Hosted
>>>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding 
>>>>>>>>>>>>>>>>>>>>>>> >>>> the entire cluster
>>>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> VM has no log on any
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> if I need to put the
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> I created it on
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> (which
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> reason, it won't get
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> to manage the
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> position to be in.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> same cluster.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> engine:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Validation of action
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> freed to object
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}'
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host, and a
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> related.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent log on the
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> volume, falling back
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> bugzilla, but they were
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> resolved).
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> hosts, for which I
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> an upgrade. I
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> But you should have
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> nodes should
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> installed across all
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> it properly?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> there, but am unsure how
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Tokar
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> down, try starting
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> engine storage domain
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> hosted engine vm.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> and displayed in the
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine VM for that
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> entirely new host and
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it completed
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> regarding a master
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> two existing master
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> up, and one ISO
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> commands are failing?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> before, it exited with
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage             : True
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date                  : False
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname                           :
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID                            : 1
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status                      : 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> unknown stale-data
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score                              : 0
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped                            : True
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance                  : False
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32                              : 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 0217f07b
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp               : 2911
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp                     : 2897
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_parse_version=1
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_feature_version=1
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        timestamp=2897 (Thu Jun 15 16:22:54 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 2017)
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        host-id=1
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        score=0
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        vm_conf_refresh_time=2911 (Thu Jun 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 15 16:23:08 2017)
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        conf_on_shared_storage=True
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        maintenance=False
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        state=AgentStopped
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>        stopped=True
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> guess it is not HA due
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster, btw?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> to a hosted engine.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the hosted engine
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returns nothing (exit
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried starting
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returned:
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> into it
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands 
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> fail, and why it
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> > Users mailing list
>>>>>>>>>>>>>>>>>>>>>>> > [email protected]
>>>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] HostedEngine VM not visible, but running

Reply via email to