Help! I put the cluster into global maintenance, then powered off and then on all of the nodes I have powered off and powered on all the nodes. I have taken it out of global maintenance. No VM has started, including the hosted engine. This is very bad. I am going to look through logs to see why nothing has started. Help greatly appreciated.
Thanks, Cam On Fri, Jun 30, 2017 at 1:00 PM, cmc <[email protected]> wrote: > So I can run from any node: hosted-engine --set-maintenance > --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This > shouldn't affect the running of any VMs, correct? Sorry for the > questions, just want to do it correctly and not make assumptions :) > > Cheers, > > C > > On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <[email protected]> wrote: >> Hi, >> >>> Just to clarify: you mean the host_id in >>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>> correct? >> >> Exactly. >> >> Put the cluster to global maintenance first. Or kill all agents (has >> the same effect). >> >> Martin >> >> On Fri, Jun 30, 2017 at 12:47 PM, cmc <[email protected]> wrote: >>> Just to clarify: you mean the host_id in >>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >>> correct? >>> >>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <[email protected]> wrote: >>>> Hi, >>>> >>>> cleaning metadata won't help in this case. Try transferring the >>>> spm_ids you got from the engine to the proper hosted engine hosts so >>>> the hosted engine ids match the spm_ids. Then restart all hosted >>>> engine services. I would actually recommend restarting all hosts after >>>> this change, but I have no idea how many VMs you have running. >>>> >>>> Martin >>>> >>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <[email protected]> wrote: >>>>> Tried running a 'hosted-engine --clean-metadata" as per >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>>>> ovirt-ha-agent was not running anyway, but it fails with the following >>>>> error: >>>>> >>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>>>> to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>>>> call last): >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 191, in _run_agent >>>>> return action(he) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>> line 67, in action_clean >>>>> return he.clean(options.force_cleanup) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 345, in clean >>>>> self._initialize_domain_monitor() >>>>> File >>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>> line 823, in _initialize_domain_monitor >>>>> raise Exception(msg) >>>>> Exception: Failed to start monitoring domain >>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>> during domain acquisition >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, >>>>> attempt '0' >>>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>>>> occurred, giving up. Please review the log and consider filing a bug. >>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>>> >>>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <[email protected]> wrote: >>>>>> Actually, it looks like sanlock problems: >>>>>> >>>>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>>>> number of errors has exceeded the limit" >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <[email protected]> wrote: >>>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following >>>>>>> error: >>>>>>> >>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>> ERROR Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>>>> >>>>>>> What could cause these timeouts? Some other service not running? >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote: >>>>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>>>> >>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>>>> Connection established >>>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>>>> Connection closed >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Cam >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> >>>>>>>> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>>>> broker. Is there anything in the broker.log? >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> >>>>>>>>> Martin Sivak >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote: >>>>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>>> host'. >>>>>>>>>> >>>>>>>>>> On the host that the Hosted Engine is currently on it reports in the >>>>>>>>>> agent.log: >>>>>>>>>> >>>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>>>> Connection closed: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>>>> getting service path: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>>>> call last): >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>> line 191, in _run_agent >>>>>>>>>> return >>>>>>>>>> action(he) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>>> line 64, in action_proper >>>>>>>>>> return >>>>>>>>>> he.start_monitoring() >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 411, in start_monitoring >>>>>>>>>> >>>>>>>>>> self._initialize_sanlock() >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>> line 691, in _initialize_sanlock >>>>>>>>>> >>>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>>>> line 162, in get_service_path >>>>>>>>>> >>>>>>>>>> .format(str(e))) >>>>>>>>>> RequestError: >>>>>>>>>> Failed >>>>>>>>>> to get service path: Connection closed >>>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart >>>>>>>>>> agent >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker >>>>>>>>>>> services. >>>>>>>>>>> >>>>>>>>>>> The scheduling message just means that the host has score 0 or is >>>>>>>>>>> not >>>>>>>>>>> reporting score at all. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote: >>>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>>>>> host..' >>>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>>>> engine' option, which suggests that something needs restarting. >>>>>>>>>>>> Should >>>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>>>> change in host_id? >>>>>>>>>>>> >>>>>>>>>>>> Kind regards, >>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is >>>>>>>>>>>>> a way >>>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers >>>>>>>>>>>>> would be >>>>>>>>>>>>> the best. >>>>>>>>>>>>> >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer >>>>>>>>>>>>>> to fix >>>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery >>>>>>>>>>>>>> path, I >>>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, >>>>>>>>>>>>>> would >>>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and >>>>>>>>>>>>>> then >>>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use >>>>>>>>>>>>>> the >>>>>>>>>>>>>> older backup from the bare metal engine that I originally used, >>>>>>>>>>>>>> or use >>>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have >>>>>>>>>>>>>> been >>>>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I have very little time left to get this working >>>>>>>>>>>>>> before >>>>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>>>> >>>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) >>>>>>>>>>>>>> [storage.SANLock] >>>>>>>>>>>>>> Acquiring host id for domain >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) >>>>>>>>>>>>>> [storage.Monitor] >>>>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in >>>>>>>>>>>>>> _acquireHostId >>>>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in >>>>>>>>>>>>>> acquireHostId >>>>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in >>>>>>>>>>>>>> acquireHostId >>>>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, >>>>>>>>>>>>>> 'Sanlock >>>>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>>>> >>>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>> >>>>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>>>> >>>>>>>>>>>>>> host 2: >>>>>>>>>>>>>> >>>>>>>>>>>>>> vdsm.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) >>>>>>>>>>>>>> [jsonrpc.JsonRpcServer] >>>>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", >>>>>>>>>>>>>> line >>>>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>>>> res = method(**params) >>>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", >>>>>>>>>>>>>> line >>>>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in >>>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in >>>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in >>>>>>>>>>>>>> getIoTuneResponse >>>>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>>>> % self.vmid) >>>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' >>>>>>>>>>>>>> was not >>>>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 >>>>>>>>>>>>>> 2017 >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to >>>>>>>>>>>>>> a partition! >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> host 1: >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 >>>>>>>>>>>>>> 11:01:02+0100 >>>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 >>>>>>>>>>>>>> 11:01:03+0100 >>>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>>>> >>>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>>>> >>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>>>> Agent shutting down >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The >>>>>>>>>>>>>>> other has >>>>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted >>>>>>>>>>>>>>> Engine on >>>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>>>> installed). The second one I deployed said it was able to host >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the >>>>>>>>>>>>>>> host >>>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted >>>>>>>>>>>>>>> Engine. And >>>>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to >>>>>>>>>>>>>>> move >>>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 >>>>>>>>>>>>>>> hours. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The summary is: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, >>>>>>>>>>>>>>> reinstalled >>>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the >>>>>>>>>>>>>>> Hosted >>>>>>>>>>>>>>> Hngine, host_id of '2' in >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted >>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an >>>>>>>>>>>>>>> icon >>>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after >>>>>>>>>>>>>>> migration >>>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it >>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in >>>>>>>>>>>>>>> sanlock.log >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, >>>>>>>>>>>>>>> which was >>>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal >>>>>>>>>>>>>>> engine >>>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, >>>>>>>>>>>>>>> without >>>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM >>>>>>>>>>>>>>> after >>>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted >>>>>>>>>>>>>>> Engine). >>>>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will look through all the logs for any other errors. Please >>>>>>>>>>>>>>> let me >>>>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Campbell >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak >>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which >>>>>>>>>>>>>>>> one it is >>>>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there >>>>>>>>>>>>>>>> is a bug somewhere. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in >>>>>>>>>>>>>>>>> /var/log/sanlock: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire >>>>>>>>>>>>>>>>> host_id 1 >>>>>>>>>>>>>>>>> busy1 1 2 1042692 >>>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace >>>>>>>>>>>>>>>>> fail result -262 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> C >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment >>>>>>>>>>>>>>>>>> completed >>>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted >>>>>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok >>>>>>>>>>>>>>>>>> on the host >>>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it >>>>>>>>>>>>>>>>>> starts, >>>>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted >>>>>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: >>>>>>>>>>>>>>>>>> ovirt-ha-broker >>>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker >>>>>>>>>>>>>>>>>> ERROR >>>>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>>>> Traceback >>>>>>>>>>>>>>>>>> (most >>>>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>>>> OSError: >>>>>>>>>>>>>>>>>> [Errno 2] >>>>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it >>>>>>>>>>>>>>>>>> fine. The >>>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the >>>>>>>>>>>>>>>>>> VM vs the >>>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is >>>>>>>>>>>>>>>>>> vdsm:qemu. Is >>>>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak >>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the >>>>>>>>>>>>>>>>>>> services >>>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and >>>>>>>>>>>>>>>>>>> select Hosted >>>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> I changed the >>>>>>>>>>>>>>>>>>>> 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same >>>>>>>>>>>>>>>>>>>> display protocols >>>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. >>>>>>>>>>>>>>>>>>>> I am >>>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use >>>>>>>>>>>>>>>>>>>> the 3.6 >>>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine >>>>>>>>>>>>>>>>>>>> VM to >>>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently >>>>>>>>>>>>>>>>>>>> on into >>>>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to >>>>>>>>>>>>>>>>>>>> Maintenance mode. >>>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine >>>>>>>>>>>>>>>>>>>> VM. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine >>>>>>>>>>>>>>>>>>>> status: >>>>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. >>>>>>>>>>>>>>>>>>>> It was set to >>>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other >>>>>>>>>>>>>>>>>>>> two hosts. >>>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So in my >>>>>>>>>>>>>>>>>>>>> /usr/share/ovirt-engine/conf/osinfo-defaults.properties >>>>>>>>>>>>>>>>>>>>> on my >>>>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 >>>>>>>>>>>>>>>>>>>>> should not apply >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek >>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the >>>>>>>>>>>>>>>>>>>>>>> > check that causes >>>>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation >>>>>>>>>>>>>>>>>>>>>>> >>>>> of action >>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from >>>>>>>>>>>>>>>>>>>>>>> older releases on >>>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus >>>>>>>>>>>>>>>>>>>>>>> to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc >>>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the >>>>>>>>>>>>>>>>>>>>>>> >>> database backup from >>>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible >>>>>>>>>>>>>>>>>>>>>>> >>> to just try again >>>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the >>>>>>>>>>>>>>>>>>>>>>> >>> offending value there >>>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, >>>>>>>>>>>>>>>>>>>>>>> >> as re-running the >>>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have >>>>>>>>>>>>>>>>>>>>>>> >> to re-initialise >>>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the >>>>>>>>>>>>>>>>>>>>>>> >> offending value(s) >>>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a >>>>>>>>>>>>>>>>>>>>>>> >> config file >>>>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation >>>>>>>>>>>>>>>>>>>>>>> >>>>> of action >>>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up >>>>>>>>>>>>>>>>>>>>>>> >>>>> correctly and >>>>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt >>>>>>>>>>>>>>>>>>>>>>> >>>>> in a stable state. >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a >>>>>>>>>>>>>>>>>>>>>>> >>>>> previous/initial >>>>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state >>>>>>>>>>>>>>>>>>>>>>> >>>> for now. This was a >>>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't >>>>>>>>>>>>>>>>>>>>>>> >>>> report any error >>>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my >>>>>>>>>>>>>>>>>>>>>>> >>>> first attempts at >>>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to >>>>>>>>>>>>>>>>>>>>>>> >>>> a proxy issue) but >>>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to >>>>>>>>>>>>>>>>>>>>>>> >>>> get the Hosted >>>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding >>>>>>>>>>>>>>>>>>>>>>> >>>> the entire cluster >>>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine >>>>>>>>>>>>>>>>>>>>>>> >>>>>> VM has no log on any >>>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that >>>>>>>>>>>>>>>>>>>>>>> >>>>>> if I need to put the >>>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that >>>>>>>>>>>>>>>>>>>>>>> >>>>>> I created it on >>>>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any >>>>>>>>>>>>>>>>>>>>>>> >>>>>> reason, it won't get >>>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able >>>>>>>>>>>>>>>>>>>>>>> >>>>>> to manage the >>>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous >>>>>>>>>>>>>>>>>>>>>>> >>>>>> position to be in. >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> same cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> engine: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Validation of action >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> freed to object >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host, and a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> related. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> volume, falling back >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> bugzilla, but they were >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> resolved). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> hosts, for which I >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> an upgrade. I >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> But you should have >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> nodes should >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> installed across all >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> it properly? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> there, but am unsure how >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Tokar >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> down, try starting >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> engine storage domain >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> hosted engine vm. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> and displayed in the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine VM for that >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> entirely new host and >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it completed >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> regarding a master >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> two existing master >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> up, and one ISO >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> commands are failing? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> before, it exited with >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> unknown stale-data >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 0217f07b >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 2017) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> guess it is not HA due >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster, btw? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> to a hosted engine. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the hosted engine >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returns nothing (exit >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried starting >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returned: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> into it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> fail, and why it >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>>> >>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>>>> > [email protected] >>>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

