So I can run from any node: hosted-engine --set-maintenance --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This shouldn't affect the running of any VMs, correct? Sorry for the questions, just want to do it correctly and not make assumptions :)
Cheers, C On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak <[email protected]> wrote: > Hi, > >> Just to clarify: you mean the host_id in >> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >> correct? > > Exactly. > > Put the cluster to global maintenance first. Or kill all agents (has > the same effect). > > Martin > > On Fri, Jun 30, 2017 at 12:47 PM, cmc <[email protected]> wrote: >> Just to clarify: you mean the host_id in >> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id, >> correct? >> >> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak <[email protected]> wrote: >>> Hi, >>> >>> cleaning metadata won't help in this case. Try transferring the >>> spm_ids you got from the engine to the proper hosted engine hosts so >>> the hosted engine ids match the spm_ids. Then restart all hosted >>> engine services. I would actually recommend restarting all hosts after >>> this change, but I have no idea how many VMs you have running. >>> >>> Martin >>> >>> On Thu, Jun 29, 2017 at 8:27 PM, cmc <[email protected]> wrote: >>>> Tried running a 'hosted-engine --clean-metadata" as per >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since >>>> ovirt-ha-agent was not running anyway, but it fails with the following >>>> error: >>>> >>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed >>>> to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent >>>> call last): >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 191, in _run_agent >>>> return action(he) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> line 67, in action_clean >>>> return he.clean(options.force_cleanup) >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 345, in clean >>>> self._initialize_domain_monitor() >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> line 823, in _initialize_domain_monitor >>>> raise Exception(msg) >>>> Exception: Failed to start monitoring domain >>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>> during domain acquisition >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent >>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt >>>> '0' >>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors >>>> occurred, giving up. Please review the log and consider filing a bug. >>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>> >>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc <[email protected]> wrote: >>>>> Actually, it looks like sanlock problems: >>>>> >>>>> "SanlockInitializationError: Failed to initialize sanlock, the >>>>> number of errors has exceeded the limit" >>>>> >>>>> >>>>> >>>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc <[email protected]> wrote: >>>>>> Sorry, I am mistaken, two hosts failed for the agent with the following >>>>>> error: >>>>>> >>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>> ERROR Failed to start monitoring domain >>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>> during domain acquisition >>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>>>> ERROR Shutting down the agent because of 3 failures in a row! >>>>>> >>>>>> What could cause these timeouts? Some other service not running? >>>>>> >>>>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote: >>>>>>> Both services are up on all three hosts. The broke logs just report: >>>>>>> >>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) >>>>>>> Connection established >>>>>>> Thread-6549::INFO::2017-06-29 >>>>>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) >>>>>>> Connection closed >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >>>>>>>> are restarted and up. The error says the agent can't talk to the >>>>>>>> broker. Is there anything in the broker.log? >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> Martin Sivak >>>>>>>> >>>>>>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote: >>>>>>>>> I've restarted those two services across all hosts, have taken the >>>>>>>>> Hosted Engine host out of maintenance, and when I try to migrate the >>>>>>>>> Hosted Engine over to another host, it reports that all three hosts >>>>>>>>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>>>>>>>> host'. >>>>>>>>> >>>>>>>>> On the host that the Hosted Engine is currently on it reports in the >>>>>>>>> agent.log: >>>>>>>>> >>>>>>>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>>>>>>>> Connection closed: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>>>>>>>> getting service path: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>>>>>>>> call last): >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>> line 191, in _run_agent >>>>>>>>> return >>>>>>>>> action(he) >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>>>>> line 64, in action_proper >>>>>>>>> return >>>>>>>>> he.start_monitoring() >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 411, in start_monitoring >>>>>>>>> >>>>>>>>> self._initialize_sanlock() >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>> line 691, in _initialize_sanlock >>>>>>>>> >>>>>>>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>>>>>>>> File >>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>>>>>>> line 162, in get_service_path >>>>>>>>> .format(str(e))) >>>>>>>>> RequestError: Failed >>>>>>>>> to get service path: Connection closed >>>>>>>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>>>>>>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>>>>>>>> >>>>>>>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker >>>>>>>>>> services. >>>>>>>>>> >>>>>>>>>> The scheduling message just means that the host has score 0 or is not >>>>>>>>>> reporting score at all. >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote: >>>>>>>>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>>>>>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>>>>>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>>>>>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>>>>>>>> engine' option, which suggests that something needs restarting. >>>>>>>>>>> Should >>>>>>>>>>> I worry about the sanlock errors, or will that be resolved by the >>>>>>>>>>> change in host_id? >>>>>>>>>>> >>>>>>>>>>> Kind regards, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> Change the ids so they are distinct. I need to check if there is a >>>>>>>>>>>> way >>>>>>>>>>>> to read the SPM ids from the engine as using the same numbers >>>>>>>>>>>> would be >>>>>>>>>>>> the best. >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>> Is there any way of recovering from this situation? I'd prefer to >>>>>>>>>>>>> fix >>>>>>>>>>>>> the issue rather than re-deploy, but if there is no recovery >>>>>>>>>>>>> path, I >>>>>>>>>>>>> could perhaps try re-deploying the hosted engine. In which case, >>>>>>>>>>>>> would >>>>>>>>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>>>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>>>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>>>>>>>> older backup from the bare metal engine that I originally used, >>>>>>>>>>>>> or use >>>>>>>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>>>>>>>> added since switching to Hosted Engine. >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately I have very little time left to get this working >>>>>>>>>>>>> before >>>>>>>>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>>>>>>>> >>>>>>>>>>>>> Here are some log snippets from the cluster that are current >>>>>>>>>>>>> >>>>>>>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) >>>>>>>>>>>>> [storage.SANLock] >>>>>>>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f >>>>>>>>>>>>> (id: >>>>>>>>>>>>> 3) (clusterlock:282) >>>>>>>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) >>>>>>>>>>>>> [storage.Monitor] >>>>>>>>>>>>> Error acquiring host id 3 for domain >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in >>>>>>>>>>>>> _acquireHostId >>>>>>>>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>>>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>>>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>>>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>>>>>>>> line 297, in acquireHostId >>>>>>>>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>>>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, >>>>>>>>>>>>> 'Sanlock >>>>>>>>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>>>>>>>> >>>>>>>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>> >>>>>>>>>>>>> From sanlock.log: >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>> conflicts with name of list1 s5 >>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>> >>>>>>>>>>>>> From the two other hosts: >>>>>>>>>>>>> >>>>>>>>>>>>> host 2: >>>>>>>>>>>>> >>>>>>>>>>>>> vdsm.log >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) >>>>>>>>>>>>> [jsonrpc.JsonRpcServer] >>>>>>>>>>>>> Internal server error (__init__:570) >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", >>>>>>>>>>>>> line >>>>>>>>>>>>> 565, in _handle_request >>>>>>>>>>>>> res = method(**params) >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>>>>>>>> 202, in _dynamicMethod >>>>>>>>>>>>> result = fn(*methodArgs) >>>>>>>>>>>>> File "/usr/share/vdsm/API.py", line 1454, in >>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>>>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in >>>>>>>>>>>>> getAllVmIoTunePolicies >>>>>>>>>>>>> 'current_values': v.getIoTune()} >>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>>>>>>>> result = self.getIoTuneResponse() >>>>>>>>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in >>>>>>>>>>>>> getIoTuneResponse >>>>>>>>>>>>> res = self._dom.blockIoTune( >>>>>>>>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", >>>>>>>>>>>>> line >>>>>>>>>>>>> 47, in __getattr__ >>>>>>>>>>>>> % self.vmid) >>>>>>>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was >>>>>>>>>>>>> not >>>>>>>>>>>>> started yet or was shut down >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>> OVF_STORE volume path: >>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>> Found an OVF for HE VM, trying to convert >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>> Got vm.conf from OVF_STORE >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 >>>>>>>>>>>>> 2017 >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>>>>>>>> MainThread::INFO::2017-06-29 >>>>>>>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>>>>>>>> Reloading vm.conf from the shared storage domain >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/messages: >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to >>>>>>>>>>>>> a partition! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> host 1: >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/messages also in sanlock.log >>>>>>>>>>>>> >>>>>>>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>>>>>>>> >>>>>>>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>>>>>>>> >>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>>>>>>>> Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::WARNING::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Unexpected error >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 443, in start_monitoring >>>>>>>>>>>>> self._initialize_domain_monitor() >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>>>>>>>> line 823, in _initialize_domain_monitor >>>>>>>>>>>>> raise Exception(msg) >>>>>>>>>>>>> Exception: Failed to start monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>>>>>>>> during domain acquisition >>>>>>>>>>>>> MainThread::ERROR::2017-06-27 >>>>>>>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>>>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>>>>>>>> VDSM domain monitor status: PENDING >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>>>>>>>> Failed to stop monitoring domain >>>>>>>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>>>>>>>> MainThread::INFO::2017-06-27 >>>>>>>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>>>>>>>> Agent shutting down >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote: >>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>> >>>>>>>>>>>>>> yes, on two of the machines they have the same host_id. The >>>>>>>>>>>>>> other has >>>>>>>>>>>>>> a different host_id. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted >>>>>>>>>>>>>> Engine on >>>>>>>>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>>>>>>>> installed). The second one I deployed said it was able to host >>>>>>>>>>>>>> the >>>>>>>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the >>>>>>>>>>>>>> host >>>>>>>>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>>>>>>>> migrate over. It managed to move all hosts but the Hosted >>>>>>>>>>>>>> Engine. And >>>>>>>>>>>>>> now the host that said it was able to host the engine says >>>>>>>>>>>>>> 'unavailable due to HA score'. The host that it was trying to >>>>>>>>>>>>>> move >>>>>>>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The summary is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, >>>>>>>>>>>>>> reinstalled >>>>>>>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the >>>>>>>>>>>>>> Hosted >>>>>>>>>>>>>> Hngine, host_id of '2' in >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted >>>>>>>>>>>>>> Engine >>>>>>>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>>>>>>>> saying that it was able to host the Hosted Engine, but after >>>>>>>>>>>>>> migration >>>>>>>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it >>>>>>>>>>>>>> reports: >>>>>>>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in >>>>>>>>>>>>>> sanlock.log >>>>>>>>>>>>>> >>>>>>>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, >>>>>>>>>>>>>> which was >>>>>>>>>>>>>> not part of the original cluster. I restored the bare-metal >>>>>>>>>>>>>> engine >>>>>>>>>>>>>> backup in the Hosted Engine on this host when deploying it, >>>>>>>>>>>>>> without >>>>>>>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM >>>>>>>>>>>>>> after >>>>>>>>>>>>>> I put that host into maintenance to test the HA of Hosted >>>>>>>>>>>>>> Engine). >>>>>>>>>>>>>> Sanlock log shows conflicts >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will look through all the logs for any other errors. Please >>>>>>>>>>>>>> let me >>>>>>>>>>>>>> know if you need any logs or other clarification/information. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Campbell >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak >>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> can you please check the contents of >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one >>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>> right now) and search for host-id? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Make sure the IDs are different. If they are not, then there is >>>>>>>>>>>>>>> a bug somewhere. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Martin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>>> I see this on the host it is trying to migrate in >>>>>>>>>>>>>>>> /var/log/sanlock: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire >>>>>>>>>>>>>>>> host_id 1 >>>>>>>>>>>>>>>> busy1 1 2 1042692 >>>>>>>>>>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace >>>>>>>>>>>>>>>> fail result -262 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> C >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> wrote: >>>>>>>>>>>>>>>>> Hi Martin, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment >>>>>>>>>>>>>>>>> completed >>>>>>>>>>>>>>>>> without error. However, it still will not allow the Hosted >>>>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>>>> migrate to another host. The >>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on >>>>>>>>>>>>>>>>> the host >>>>>>>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it >>>>>>>>>>>>>>>>> starts, >>>>>>>>>>>>>>>>> reports: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted >>>>>>>>>>>>>>>>> Engine >>>>>>>>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: >>>>>>>>>>>>>>>>> ovirt-ha-broker >>>>>>>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker >>>>>>>>>>>>>>>>> ERROR >>>>>>>>>>>>>>>>> Failed to read metadata from >>>>>>>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>>>>>>>> Traceback >>>>>>>>>>>>>>>>> (most >>>>>>>>>>>>>>>>> recent call last): >>>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>>>>>>>> f = >>>>>>>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>>>>>>>> OSError: >>>>>>>>>>>>>>>>> [Errno 2] >>>>>>>>>>>>>>>>> No such file or directory: >>>>>>>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it >>>>>>>>>>>>>>>>> fine. The >>>>>>>>>>>>>>>>> perms are slightly different on the host that is running the >>>>>>>>>>>>>>>>> VM vs the >>>>>>>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is >>>>>>>>>>>>>>>>> vdsm:qemu. Is >>>>>>>>>>>>>>>>> this a san locking issue? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak >>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the >>>>>>>>>>>>>>>>>> services >>>>>>>>>>>>>>>>>> running. Please put one other host to maintenance and select >>>>>>>>>>>>>>>>>> Hosted >>>>>>>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 >>>>>>>>>>>>>>>>>>> = >>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display >>>>>>>>>>>>>>>>>>> protocols >>>>>>>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. >>>>>>>>>>>>>>>>>>> I am >>>>>>>>>>>>>>>>>>> guessing the compatibility version was causing it to use >>>>>>>>>>>>>>>>>>> the 3.6 >>>>>>>>>>>>>>>>>>> version. However, I am still unable to migrate the engine >>>>>>>>>>>>>>>>>>> VM to >>>>>>>>>>>>>>>>>>> another host. When I try putting the host it is currently >>>>>>>>>>>>>>>>>>> on into >>>>>>>>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to >>>>>>>>>>>>>>>>>>> Maintenance mode. >>>>>>>>>>>>>>>>>>> There are no available hosts capable of running the engine >>>>>>>>>>>>>>>>>>> VM. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine >>>>>>>>>>>>>>>>>>> status: >>>>>>>>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It >>>>>>>>>>>>>>>>>>> was set to >>>>>>>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other >>>>>>>>>>>>>>>>>>> two hosts. >>>>>>>>>>>>>>>>>>> Should it be? It was not in the instructions for the >>>>>>>>>>>>>>>>>>> migration from >>>>>>>>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> So in my >>>>>>>>>>>>>>>>>>>> /usr/share/ovirt-engine/conf/osinfo-defaults.properties on >>>>>>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 >>>>>>>>>>>>>>>>>>>> should not apply >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek >>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak >>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the >>>>>>>>>>>>>>>>>>>>>> > check that causes >>>>>>>>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation >>>>>>>>>>>>>>>>>>>>>> >>>>> of action >>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from >>>>>>>>>>>>>>>>>>>>>> older releases on >>>>>>>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus >>>>>>>>>>>>>>>>>>>>>> to vga for non-SPICE >>>>>>>>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc >>>>>>>>>>>>>>>>>>>>>> > <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the >>>>>>>>>>>>>>>>>>>>>> >>> database backup from >>>>>>>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible >>>>>>>>>>>>>>>>>>>>>> >>> to just try again >>>>>>>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the >>>>>>>>>>>>>>>>>>>>>> >>> offending value there >>>>>>>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as >>>>>>>>>>>>>>>>>>>>>> >> re-running the >>>>>>>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have >>>>>>>>>>>>>>>>>>>>>> >> to re-initialise >>>>>>>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the >>>>>>>>>>>>>>>>>>>>>> >> offending value(s) >>>>>>>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a >>>>>>>>>>>>>>>>>>>>>> >> config file >>>>>>>>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc >>>>>>>>>>>>>>>>>>>>>> >>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation >>>>>>>>>>>>>>>>>>>>>> >>>>> of action >>>>>>>>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: >>>>>>>>>>>>>>>>>>>>>> >>>>> VAR__ACTION__IMPORT >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up >>>>>>>>>>>>>>>>>>>>>> >>>>> correctly and >>>>>>>>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt >>>>>>>>>>>>>>>>>>>>>> >>>>> in a stable state. >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a >>>>>>>>>>>>>>>>>>>>>> >>>>> previous/initial >>>>>>>>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for >>>>>>>>>>>>>>>>>>>>>> >>>> now. This was a >>>>>>>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't >>>>>>>>>>>>>>>>>>>>>> >>>> report any error >>>>>>>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my >>>>>>>>>>>>>>>>>>>>>> >>>> first attempts at >>>>>>>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to >>>>>>>>>>>>>>>>>>>>>> >>>> a proxy issue) but >>>>>>>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to >>>>>>>>>>>>>>>>>>>>>> >>>> get the Hosted >>>>>>>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding >>>>>>>>>>>>>>>>>>>>>> >>>> the entire cluster >>>>>>>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM >>>>>>>>>>>>>>>>>>>>>> >>>>>> has no log on any >>>>>>>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if >>>>>>>>>>>>>>>>>>>>>> >>>>>> I need to put the >>>>>>>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I >>>>>>>>>>>>>>>>>>>>>> >>>>>> created it on >>>>>>>>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any >>>>>>>>>>>>>>>>>>>>>> >>>>>> reason, it won't get >>>>>>>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able >>>>>>>>>>>>>>>>>>>>>> >>>>>> to manage the >>>>>>>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position >>>>>>>>>>>>>>>>>>>>>> >>>>>> to be in. >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the >>>>>>>>>>>>>>>>>>>>>> >>>>>>> same cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the >>>>>>>>>>>>>>>>>>>>>> >>>>>>> engine: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Validation of action >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock >>>>>>>>>>>>>>>>>>>>>> >>>>>>> freed to object >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}' >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed >>>>>>>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same >>>>>>>>>>>>>>>>>>>>>> >>>>>>> host, and a >>>>>>>>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are >>>>>>>>>>>>>>>>>>>>>> >>>>>>> related. >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> And this in the >>>>>>>>>>>>>>>>>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent log on the >>>>>>>>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE >>>>>>>>>>>>>>>>>>>>>> >>>>>>> volume, falling back >>>>>>>>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in >>>>>>>>>>>>>>>>>>>>>> >>>>>>> bugzilla, but they were >>>>>>>>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be >>>>>>>>>>>>>>>>>>>>>> >>>>>>> resolved). >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two >>>>>>>>>>>>>>>>>>>>>> >>>>>>> hosts, for which I >>>>>>>>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as >>>>>>>>>>>>>>>>>>>>>> >>>>>>> an upgrade. I >>>>>>>>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak >>>>>>>>>>>>>>>>>>>>>> >>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> you should have >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> nodes should >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> installed across all >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> it properly? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> but am unsure how >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Tokar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> down, try starting >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> engine storage domain >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> hosted engine vm. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> and displayed in the >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine VM for that >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> entirely new host and >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> it completed >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> regarding a master >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> two existing master >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and one ISO >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> commands are failing? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> before, it exited with >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> unknown stale-data >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 0217f07b >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 2017) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> 15 16:23:08 2017) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> guess it is not HA due >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster, btw? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> a hosted engine. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the hosted engine >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returns nothing (exit >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried starting >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> returned: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> into it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> fail, and why it >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>>>>>>>> >>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>>>>>>>> > [email protected] >>>>>>>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

