No problem and I am glad you found the issue. -- Martin Sivák [email protected] Red Hat Czech RHEV-M SLA / Brno, CZ
----- Original Message ----- > Ok Martin, > > I could track this issue down to the storage appliance; there the > rpc.bind service is dying for some reasons - so HE-HA did the correct > thing indeed! > > Thanks for the help! > > On 22.10.2014 10:17, Martin Sivak wrote: > > Hi, > > > > I think there is something weird going on with your storage, this is the > > crash snippet from the host that had the engine at the beginning: > > > > /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 > > 20:22:33,919::task::866::Storage.TaskManager.Task::(_setError) > > Task=`2ad31974-e1fc-4785-9423-ff3bd087a5aa`::Unexpected error > > /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 > > 20:22:33,934::dispatcher::79::Storage.Dispatcher::(wrapper) Connection > > timed out > > /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 > > 20:23:00,733::sdc::137::Storage.StorageDomainCache::(_findDomain) looking > > for unfetched domain 68aad705-7c9b-427a-a84c-6f32f23675b3 > > /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 > > 20:23:00,734::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) > > looking for domain 68aad705-7c9b-427a-a84c-6f32f23675b3 > > /var/log/vdsm/vdsm.log:VM Channels Listener::ERROR::2014-10-21 > > 20:23:04,258::vmchannels::54::vds::(_handle_event) Received 00000011 on > > fileno 53 > > > > The second host's VDSM lost the connection to storage domain at the same > > time.. > > > > 20:23:09,950::states::437::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) > > Engine vm is running on host 192.168.50.201 (id 1) > > 20:23:12,365::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) > > VDSM domain monitor status: PENDING > > > > The engine VM was restarted right after the connection was restored: > > > > 20:25:54,336::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) > > VDSM domain monitor status: PENDING > > 20:26:20,572::hosted_engine::571::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) > > Acquired lock on host id 2 > > 20:26:20,572::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Current state EngineDown (score: 2400) > > 20:26:20,572::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Best remote host 192.168.50.201 (id: 1, score: 2400) > > 20:26:30,606::states::459::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) > > Engine down and local host has best score (2400), attempting to start > > engine VM > > > > ... > > > > 20:27:34,423::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) > > Timeout cleared while transitioning <class > > 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class > > 'ovirt_hosted_engine_ha.agent.states.EngineUp'> > > 20:27:34,430::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Trying: notify time=1413916054.43 type=state_transition > > detail=EngineStarting-EngineUp hostname='nodehv02.lab.mbox.loc' > > 20:27:34,498::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) > > Success, was notification of state_transition (EngineStarting-EngineUp) > > sent? sent > > 20:27:38,481::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Current state EngineUp (score: 2400) > > > > All was then well till the end of the log. > > > > 20:29:53,393::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) > > Engine vm running on localhost > > 20:29:55,372::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Current state EngineUp (score: 2400) > > 20:29:55,372::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > > Best remote host 192.168.50.201 (id: 1, score: 0) > > > > > > Hosted engine had nothing to do with the engine crash according to the log. > > On the contrary, it properly re-started the VM once the cluster recovered > > from the storage issue. > > > > Can you give us more information about the setup? Storage type, topology, > > ... > > > > -- > > Martin Sivák > > [email protected] > > Red Hat Czech > > RHEV-M SLA / Brno, CZ > > > > ----- Original Message ----- > >> Hello, > >> > >> since upgrading to the latest hosted-engine-ha I have the follwing > >> problem: > >> > >> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) > >> Engine vm died unexpectedly > >> > >> I suppose HA is forcing the engine down because liveliness check is > >> failing. I attached a log compile from the latest incident, 2014-10-21 > >> 16:26:31,836. The 'host' logs are from the hosts the engine was running > >> on, host2 the other HA host > >> Interestingly this only happens when I was connected via a VNC console > >> to one of my Winodws 2012 VMs. > >> > >> > >> How can I further debug this? > >> The engine log seems empty and also the HE does not seem to have any > >> trouble when this happens. As precaustion / test I set my cluster to > >> global maintenance. > >> > >> Thanks, > >> > >> vdsm-python-zombiereaper-4.16.7-1.gitdb83943.el6.noarch > >> vdsm-xmlrpc-4.16.7-1.gitdb83943.el6.noarch > >> vdsm-4.16.7-1.gitdb83943.el6.x86_64 > >> vdsm-python-4.16.7-1.gitdb83943.el6.noarch > >> vdsm-yajsonrpc-4.16.7-1.gitdb83943.el6.noarch > >> vdsm-jsonrpc-4.16.7-1.gitdb83943.el6.noarch > >> vdsm-cli-4.16.7-1.gitdb83943.el6.noarch > >> > >> ovirt-hosted-engine-ha-1.2.4-1.el6.noarch > >> ovirt-release35-001-1.noarch > >> ovirt-host-deploy-1.3.0-1.el6.noarch > >> ovirt-hosted-engine-setup-1.2.1-1.el6.noarch > >> ovirt-release34-1.0.3-1.noarch > >> ovirt-engine-sdk-python-3.5.0.7-1.el6.noarch > >> > >> > >> -- > >> Daniel Helgenberger > >> m box bewegtbild GmbH > >> > >> P: +49/30/2408781-22 > >> F: +49/30/2408781-10 > >> > >> ACKERSTR. 19 > >> D-10115 BERLIN > >> > >> > >> www.m-box.de www.monkeymen.tv > >> > >> Geschäftsführer: Martin Retschitzegger / Michaela Göllner > >> Handeslregister: Amtsgericht Charlottenburg / HRB 112767 > >> > >> > >> _______________________________________________ > >> Users mailing list > >> [email protected] > >> http://lists.ovirt.org/mailman/listinfo/users > >> > > -- > Daniel Helgenberger > m box bewegtbild GmbH > > P: +49/30/2408781-22 > F: +49/30/2408781-10 > > ACKERSTR. 19 > D-10115 BERLIN > > > www.m-box.de www.monkeymen.tv > > Geschäftsführer: Martin Retschitzegger / Michaela Göllner > Handeslregister: Amtsgericht Charlottenburg / HRB 112767 > > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

