I've had no other choice but to power up the old bare metal engine to be able to start the VMs. This is probably really bad but I had to get the VMs running. I am guessing now that if the host is shutdown rather than simply rebooted, that the VMs will not restart on powerup of the host. This would not have been such a problem if the Hosted Engine started.
So I'm not sure where to go from here... I guess it is start from scratch again? On Fri, Jun 30, 2017 at 3:19 PM, cmc <[email protected]> wrote: > Help! I put the cluster into global maintenance, then powered off and > then on all of the nodes I have powered off and powered on all the > nodes. I have taken it out of global maintenance. No VM has started, > including the hosted engine. This is very bad. I am going to look > through logs to see why nothing has started. Help greatly appreciated. > > Thanks, > > Cam > > On Fri, Jun 30, 2017 at 1:00 PM, cmc <[email protected]> wrote: >> So I can run from any node: hosted-engine --set-maintenance >> --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This >> shouldn't affect the running of any VMs, correct? Sorry for the >> questions, just want to do it correctly and not make assumptions :) >> >> Cheers, >> >> C >> >> On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <[email protected]> wrote: >>> Hi, >>> >>>> Just to clarify: you mean the host_id in >>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>>> correct? >>> >>> Exactly. >>> >>> Put the cluster to global maintenance first. Or kill all agents (has >>> the same effect). >>> >>> Martin >>> >>> On Fri, Jun 30, 2017 at 12:47 PM, cmc <[email protected]> wrote: >>>> Just to clarify: you mean the host_id in >>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>>> correct? >>>> >>>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> cleaning metadata won't help in this case. Try transferring the >>>>> spm_ids you got from the engine to the proper hosted engine hosts so >>>>> the hosted engine ids match the spm_ids. Then restart all hosted >>>>> engine services. I would actually recommend restarting all hosts after >>>>> this change, but I have no idea how many VMs you have running. >>>>> >>>>> Martin >>>>> >>>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <[email protected]> wrote: >>>>>> Tried running a 'hosted-engine --clean-metadata" as per >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>>>>> ovirt-ha-agent was not running anyway, but it fails with the following >>>>>> error: >>>>>> >>>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>>>>> to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>>>>> call last): >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 191, in _run_agent >>>>>> return action(he) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 67, in action_clean >>>>>> return he.clean(options.force_cleanup) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 345, in clean >>>>>> self._initialize_domain_monitor() >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 823, in _initialize_domain_monitor >>>>>> raise Exception(msg) >>>>>> Exception: Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>>>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, >>>>>> attempt '0' >>>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>>>>> occurred, giving up. Please review the log and consider filing a bug. >>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>>>> >>>>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <[email protected]> wrote: >>>>>>> Actually, it looks like sanlock problems: >>>>>>> >>>>>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>>>>> number of errors has exceeded the limit" >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <[email protected]> wrote: >>>>>>>> Sorry, I am mistaken, two hosts failed for the agent with the >>>>>>>> following error: >>>>>>>> >>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>>> ERROR Failed to start monitoring domain >>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>> during domain acquisition >>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>>>>> >>>>>>>> What could cause these timeouts? Some other service not running? >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote: >>>>>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>>>>> >>>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>>>>> Connection established >>>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>>>>> Connection closed >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Cam >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker >>>>>>>>>> services >>>>>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>>>>> broker. Is there anything in the broker.log? >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> >>>>>>>>>> Martin Sivak >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote: >>>>>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted >>>>>>>>>>> Engine >>>>>>>>>>> host'. >>>>>>>>>>> >>>>>>>>>>> On the host that the Hosted Engine is currently on it reports in >>>>>>>>>>> the agent.log: >>>>>>>>>>> >>>>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink >>>>>>>>>>> ERROR >>>>>>>>>>> Connection closed: Connection closed >>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>>>>> getting service path: Connection closed >>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most >>>>>>>>>>> recent >>>>>>>>>>> call last): >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>>> line 191, in _run_agent >>>>>>>>>>> return >>>>>>>>>>> action(he) >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>>> line 64, in action_proper >>>>>>>>>>> return >>>>>>>>>>> he.start_monitoring() >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>> line 411, in start_monitoring >>>>>>>>>>> >>>>>>>>>>> self._initialize_sanlock() >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>> line 691, in _initialize_sanlock >>>>>>>>>>> >>>>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>>>>> line 162, in get_service_path >>>>>>>>>>> >>>>>>>>>>> .format(str(e))) >>>>>>>>>>> RequestError: >>>>>>>>>>> Failed >>>>>>>>>>> to get service path: Connection closed >>>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart >>>>>>>>>>> agent >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker >>>>>>>>>>>> services. >>>>>>>>>>>> >>>>>>>>>>>> The scheduling message just means that the host has score 0 or is >>>>>>>>>>>> not >>>>>>>>>>>> reporting score at all. >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use >>>>>>>>>>>>> the >>>>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did >>>>>>>>>>>>> not >>>>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>>>>>> host..' >>>>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy >>>>>>>>>>>>> hosted >>>>>>>>>>>>> engine' option, which suggests that something needs restarting. >>>>>>>>>>>>> Should >>>>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>>>>> change in host_id? >>>>>>>>>>>>> >>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak >>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is >>>>>>>>>>>>>> a way >>>>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> the best. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer >>>>>>>>>>>>>>> to fix >>>>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery >>>>>>>>>>>>>>> path, I >>>>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which >>>>>>>>>>>>>>> case, would >>>>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and >>>>>>>>>>>>>>> then >>>>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> older backup from the bare metal engine that I originally used, >>>>>>>>>>>>>>> or use >>>>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have >>>>>>>>>>>>>>> been >>>>>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately I have very little time left to get this working >>>>>>>>>>>>>>> before >>>>>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted >>>>>>>>>>>>>>> Engine: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) >>>>>>>>>>>>>>> [storage.SANLock] >>>>>>>>>>>>>>> Acquiring host id for domain >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) >>>>>>>>>>>>>>> [storage.Monitor] >>>>>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in >>>>>>>>>>>>>>> _acquireHostId >>>>>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in >>>>>>>>>>>>>>> acquireHostId >>>>>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in >>>>>>>>>>>>>>> acquireHostId >>>>>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, >>>>>>>>>>>>>>> 'Sanlock >>>>>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> host 2: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> vdsm.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) >>>>>>>>>>>>>>> [jsonrpc.JsonRpcServer] >>>>>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>>>>> res = method(**params) >>>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", >>>>>>>>>>>>>>> line >>>>>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in >>>>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in >>>>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in >>>>>>>>>>>>>>> getIoTuneResponse >>>>>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>>>>> % self.vmid) >>>>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' >>>>>>>>>>>>>>> was not >>>>>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 >>>>>>>>>>>>>>> 2017 >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 >>>>>>>>>>>>>>> to a partition! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> host 1: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 >>>>>>>>>>>>>>> 11:01:02+0100 >>>>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 >>>>>>>>>>>>>>> 11:01:03+0100 >>>>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>>> File >>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>>> timeout >>>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>>>>> Agent shutting down >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The >>>>>>>>>>>>>>>> other has >>>>>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted >>>>>>>>>>>>>>>> Engine on >>>>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>>>>> installed). The second one I deployed said it was able to host >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting >>>>>>>>>>>>>>>> the host >>>>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it >>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted >>>>>>>>>>>>>>>> Engine. And >>>>>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to >>>>>>>>>>>>>>>> move >>>>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 >>>>>>>>>>>>>>>> hours. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The summary is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, >>>>>>>>>>>>>>>> reinstalled >>>>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the >>>>>>>>>>>>>>>> Hosted >>>>>>>>>>>>>>>> Hngine, host_id of '2' in >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before >>>>>>>>>>>>>>>> Hosted Engine >>>>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an >>>>>>>>>>>>>>>> icon >>>>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after >>>>>>>>>>>>>>>> migration >>>>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it >>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in >>>>>>>>>>>>>>>> sanlock.log >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, >>>>>>>>>>>>>>>> which was >>>>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal >>>>>>>>>>>>>>>> engine >>>>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, >>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only >>>>>>>>>>>>>>>> VM after >>>>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted >>>>>>>>>>>>>>>> Engine). >>>>>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will look through all the logs for any other errors. Please >>>>>>>>>>>>>>>> let me >>>>>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Campbell >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which >>>>>>>>>>>>>>>>> one it is >>>>>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there >>>>>>>>>>>>>>>>> is a bug somewhere. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in >>>>>>>>>>>>>>>>>> /var/log/sanlock: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire >>>>>>>>>>>>>>>>>> host_id 1 >>>>>>>>>>>>>>>>>> busy1 1 2 1042692 >>>>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace >>>>>>>>>>>>>>>>>> fail result -262 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> C >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment >>>>>>>>>>>>>>>>>>> completed >>>>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted >>>>>>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok >>>>>>>>>>>>>>>>>>> on the host >>>>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it >>>>>>>>>>>>>>>>>>> starts, >>>>>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt >>>>>>>>>>>>>>>>>>> Hosted Engine >>>>>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: >>>>>>>>>>>>>>>>>>> ovirt-ha-broker >>>>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker >>>>>>>>>>>>>>>>>>> ERROR >>>>>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>>>>> Traceback >>>>>>>>>>>>>>>>>>> (most >>>>>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>>>>> OSError: >>>>>>>>>>>>>>>>>>> [Errno 2] >>>>>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on >>>>>>>>>>>>>>>>>>> it fine. The >>>>>>>>>>>>>>>>>>> perms are slightly different on the host that is running >>>>>>>>>>>>>>>>>>> the VM vs the >>>>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is >>>>>>>>>>>>>>>>>>> vdsm:qemu. Is >>>>>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak >>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the >>>>>>>>>>>>>>>>>>>> services >>>>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and >>>>>>>>>>>>>>>>>>>> select Hosted >>>>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> I changed the >>>>>>>>>>>>>>>>>>>>> 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same >>>>>>>>>>>>>>>>>>>>> display protocols >>>>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of >>>>>>>>>>>>>>>>>>>>> VMs. I am >>>>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use >>>>>>>>>>>>>>>>>>>>> the 3.6 >>>>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine >>>>>>>>>>>>>>>>>>>>> VM to >>>>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently >>>>>>>>>>>>>>>>>>>>> on into >>>>>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) >>>>>>>>>>>>>>>>>>>>> to Maintenance mode. >>>>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the >>>>>>>>>>>>>>>>>>>>> engine VM. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine >>>>>>>>>>>>>>>>>>>>> status: >>>>>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. >>>>>>>>>>>>>>>>>>>>> It was set to >>>>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other >>>>>>>>>>>>>>>>>>>>> two hosts. >>>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> So in my >>>>>>>>>>>>>>>>>>>>>> /usr/share/ovirt-engine/conf/osinfo-defaults.properties >>>>>>>>>>>>>>>>>>>>>> on my >>>>>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the >>>>>>>>>>>>>>>>>>>>>> 3.6 should not apply >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek >>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak >>>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the >>>>>>>>>>>>>>>>>>>>>>>> > check that causes >>>>>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] >>>>>>>>>>>>>>>>>>>>>>>> >>>>> Validation of action >>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import >>>>>>>>>>>>>>>>>>>>>>>> from older releases on >>>>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus >>>>>>>>>>>>>>>>>>>>>>>> to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is >>>>>>>>>>>>>>>>>>>>>>> this: >>>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc >>>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the >>>>>>>>>>>>>>>>>>>>>>>> >>> database backup from >>>>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible >>>>>>>>>>>>>>>>>>>>>>>> >>> to just try again >>>>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the >>>>>>>>>>>>>>>>>>>>>>>> >>> offending value there >>>>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, >>>>>>>>>>>>>>>>>>>>>>>> >> as re-running the >>>>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have >>>>>>>>>>>>>>>>>>>>>>>> >> to re-initialise >>>>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the >>>>>>>>>>>>>>>>>>>>>>>> >> offending value(s) >>>>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a >>>>>>>>>>>>>>>>>>>>>>>> >> config file >>>>>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] >>>>>>>>>>>>>>>>>>>>>>>> >>>>> Validation of action >>>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up >>>>>>>>>>>>>>>>>>>>>>>> >>>>> correctly and >>>>>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm >>>>>>>>>>>>>>>>>>>>>>>> >>>>> wasnt in a stable state. >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a >>>>>>>>>>>>>>>>>>>>>>>> >>>>> previous/initial >>>>>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state >>>>>>>>>>>>>>>>>>>>>>>> >>>> for now. This was a >>>>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't >>>>>>>>>>>>>>>>>>>>>>>> >>>> report any error >>>>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my >>>>>>>>>>>>>>>>>>>>>>>> >>>> first attempts at >>>>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due >>>>>>>>>>>>>>>>>>>>>>>> >>>> to a proxy issue) but >>>>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way >>>>>>>>>>>>>>>>>>>>>>>> >>>> to get the Hosted >>>>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding >>>>>>>>>>>>>>>>>>>>>>>> >>>> the entire cluster >>>>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> VM has no log on any >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> if I need to put the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> I created it on >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> reason, it won't get >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> able to manage the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> position to be in. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> same cluster. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> engine: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Validation of action >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> freed to object >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host, and a >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> related. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent log on >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> volume, falling back >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> bugzilla, but they were >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> resolved). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> hosts, for which I >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> as an upgrade. I >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> But you should have >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> enabled nodes should >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> installed across all >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> see it properly? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> there, but am unsure how >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Tokar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> down, try starting >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> engine storage domain >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> hosted engine vm. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> and displayed in the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> domain. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine VM for that >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> entirely new host and >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it completed >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> regarding a master >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> two existing master >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> up, and one ISO >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> commands are failing? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> before, it exited with >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> unknown stale-data >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 0217f07b >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 16:22:54 2017) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> guess it is not HA due >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster, btw? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> to a hosted engine. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the hosted engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returns nothing (exit >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> So I tried starting >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returned: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> into it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> commands fail, and why it >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>>>>> > [email protected] >>>>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

