On Mon, Aug 22, 2022 at 9:52 AM <markec...@gmail.com> wrote:
>
> Hi!
> In the last few days I am having problem with Hosted-Engine, it keeps 
> restarting. Sometimes after few minutes, sometimes after few hours..
> I haven't done any changes on oVirt or network in that time.
> Version is 4.4.10.7-1.el8. (this was also installation version)
>
> Here are the logs:
> Agent.log------------------------------
> MainThread::INFO::2022-08-21 
> 09:48:36,200::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineUp (score: 2440)

2440 is pretty low - did you check what lowers it, other than the
single failed network test below?

> MainThread::INFO::2022-08-21 
> 09:48:36,200::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Best remote host node3.ovirt.example.org (id: 3, score: 2440)
> MainThread::ERROR::2022-08-21 
> 09:48:46,212::states::398::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Host node3.ovirt.example.org (id 3) score is significantly better than local 
> score, shutting down VM on this host
> MainThread::INFO::2022-08-21 
> 09:48:46,641::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineUp-EngineStop) sent? 
> ignored
> MainThread::INFO::2022-08-21 
> 09:48:46,706::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStop (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:48:46,706::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Best remote host node3.ovirt.example.org (id: 3, score: 3400)
> MainThread::INFO::2022-08-21 
> 09:48:56,714::hosted_engine::934::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
>  Shutting down vm using `/usr/sbin/hosted-engine --vm-shutdown`
> MainThread::INFO::2022-08-21 
> 09:48:56,871::hosted_engine::941::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
>  stdout:
> MainThread::INFO::2022-08-21 
> 09:48:56,871::hosted_engine::942::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
>  stderr:
> MainThread::ERROR::2022-08-21 
> 09:48:56,871::hosted_engine::950::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
>  Engine VM stopped on localhost
> MainThread::INFO::2022-08-21 
> 09:48:56,880::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout set to Sun Aug 21 09:53:56 2022 while transitioning <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStop'>
> MainThread::INFO::2022-08-21 
> 09:48:56,959::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStop (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:06,977::states::537::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm not running on local host
> MainThread::INFO::2022-08-21 
> 09:49:06,983::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout cleared while transitioning <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineDown'>
> MainThread::INFO::2022-08-21 
> 09:49:07,173::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineStop-EngineDown) sent? 
> ignored
> MainThread::INFO::2022-08-21 
> 09:49:07,795::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineDown (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:16,811::states::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine down and local host has best score (3400), attempting to start engine 
> VM
> MainThread::INFO::2022-08-21 
> 09:49:16,998::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineStart) sent? 
> ignored
> MainThread::INFO::2022-08-21 
> 09:49:17,179::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStart (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:17,195::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
>  Ensuring VDSM state is clear for engine VM
> MainThread::INFO::2022-08-21 
> 09:49:17,200::hosted_engine::915::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
>  Cleaning state for non-running VM
> MainThread::INFO::2022-08-21 
> 09:49:18,211::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
>  Vdsm state for VM clean
> MainThread::INFO::2022-08-21 
> 09:49:18,212::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>  Starting vm using `/usr/sbin/hosted-engine --vm-start`
> MainThread::INFO::2022-08-21 
> 09:49:18,814::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>  stdout: VM in WaitForLaunch
>
> MainThread::INFO::2022-08-21 
> 09:49:18,814::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>  stderr: Command VM.getStats with args {'vmID': 
> 'cc7931ff-8124-4724-9242-abea2ab5bf42'} failed:
> (code=1, message=Virtual machine does not exist: {'vmId': 
> 'cc7931ff-8124-4724-9242-abea2ab5bf42'})
>
> MainThread::INFO::2022-08-21 
> 09:49:18,814::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
>  Engine VM started on localhost
> MainThread::INFO::2022-08-21 
> 09:49:18,999::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineStart-EngineStarting) 
> sent? ignored
> MainThread::INFO::2022-08-21 
> 09:49:19,008::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStarting (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:29,027::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  VM is powering up..
> MainThread::INFO::2022-08-21 
> 09:49:29,033::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout set to Sun Aug 21 09:59:29 2022 while transitioning <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStarting'>
> MainThread::INFO::2022-08-21 
> 09:49:29,109::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStarting (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:38,121::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  VM is powering up..
> MainThread::INFO::2022-08-21 
> 09:49:38,195::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineStarting (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:48,218::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout cleared while transitioning <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 
> 'ovirt_hosted_engine_ha.agent.states.EngineUp'>
> MainThread::INFO::2022-08-21 
> 09:49:48,403::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineStarting-EngineUp) 
> sent? ignored
> MainThread::INFO::2022-08-21 
> 09:49:48,713::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>  Current state EngineUp (score: 3400)
> MainThread::INFO::2022-08-21 
> 09:49:58,725::states::406::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm running on localhost
>
>
>
> Broker.log------------------------------
> Thread-4::INFO::2022-08-21 
> 09:47:59,342::cpu_load_no_engine::142::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load)
>  System load total=0.0241, engine=0.0013, non-engine=0.0228
> Thread-3::INFO::2022-08-21 
> 09:48:01,311::mem_free::51::mem_free.MemFree::(action) memFree: 96106
> Thread-5::INFO::2022-08-21 
> 09:48:05,612::engine_health::246::engine_health.EngineHealth::(_result_from_stats)
>  VM is up on this host with healthy engine
> Thread-2::INFO::2022-08-21 
> 09:48:08,591::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge 
> ovirtmgmt in up state
> Thread-1::WARNING::2022-08-21 
> 09:48:10,352::network::121::network.Network::(_dns) DNS query failed:
> ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> Thread-1::WARNING::2022-08-21 
> 09:48:10,352::network::92::network.Network::(action) Failed to verify network 
> status, (4 out of 5)
> Thread-3::INFO::2022-08-21 
> 09:48:11,389::mem_free::51::mem_free.MemFree::(action) memFree: 96089
> Thread-5::INFO::2022-08-21 
> 09:48:15,707::engine_health::246::engine_health.EngineHealth::(_result_from_stats)
>  VM is up on this host with healthy engine
> Thread-2::INFO::2022-08-21 
> 09:48:18,662::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge 
> ovirtmgmt in up state
> Thread-1::WARNING::2022-08-21 
> 09:48:18,879::network::121::network.Network::(_dns) DNS query failed:
> ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> Thread-3::INFO::2022-08-21 
> 09:48:21,467::mem_free::51::mem_free.MemFree::(action) memFree: 96072
> Thread-1::WARNING::2022-08-21 
> 09:48:24,904::network::121::network.Network::(_dns) DNS query failed:
> ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> Thread-5::INFO::2022-08-21 
> 09:48:25,808::engine_health::246::engine_health.EngineHealth::(_result_from_stats)
>  VM is up on this host with healthy engine
> Thread-2::INFO::2022-08-21 
> 09:48:28,740::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge 
> ovirtmgmt in up state
> Thread-1::WARNING::2022-08-21 
> 09:48:30,416::network::121::network.Network::(_dns) DNS query failed:
> ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> Thread-1::WARNING::2022-08-21 
> 09:48:30,416::network::92::network.Network::(action) Failed to verify network 
> status, (2 out of 5)
> Thread-3::INFO::2022-08-21 
> 09:48:31,545::mem_free::51::mem_free.MemFree::(action) memFree: 96064
> Thread-5::INFO::2022-08-21 
> 09:48:35,909::engine_health::246::engine_health.EngineHealth::(_result_from_stats)
>  VM is up on this host with healthy engine
> Thread-1::WARNING::2022-08-21 
> 09:48:35,940::network::121::network.Network::(_dns) DNS query failed:
> ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5 +tcp
> ;; global options: +cmd
> ;; connection timed out; no servers could be reached
>
> Thread-1::WARNING::2022-08-21 
> 09:48:37,480::network::92::network.Network::(action) Failed to verify network 
> status, (4 out of 5)
> Thread-2::INFO::2022-08-21 
> 09:48:38,809::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action) Found bridge 
> ovirtmgmt in up state
> Thread-3::INFO::2022-08-21 
> 09:48:41,623::mem_free::51::mem_free.MemFree::(action) memFree: 96014
> Thread-1::INFO::2022-08-21 
> 09:48:42,549::network::88::network.Network::(action) Successfully verified 
> network status
> Thread-5::INFO::2022-08-21 
> 09:48:46,011::engine_health::246::engine_health.EngineHealth::(_result_from_stats)
>  VM is up on this host with healthy engine
> Listener::ERROR::2022-08-21 
> 09:48:46,639::notifications::42::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
>  (530, b'5.7.1 Authentication required', 'ale...@example.org.hr')
>
> At first I thought that it is related to this bugs.
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/2HTD5WR43M5MUTEDMM4HRFBADIXEQNB4/
> https://bugzilla.redhat.com/show_bug.cgi?id=1984356
>
> But in this oVirt version that bug should already be solved.

Indeed - as you can see in your log snippet above, dig was called with '+tcp'.

> I was trying to monitor network, but this error keeps happening even if 
> network load is low.

It might be something else.

> I did try to do continuous dig and ping commands on VM-s running on same host 
> as Hosted Engine, and did not have any network problems, not even one 
> connection drop.

What exactly do you test?

Before fixing the above bug, we added to ovirt-system-tests loops of
'dig', and did see drops - not many, but enough, apparently, and
often.

The 'dig' test is not very configurable, from -ha's POV - but you do
have control over it from elsewhere - resolv.conf, your name server,
etc. Also, note that it runs 'dig' without passing a query, and the
default query is for '.' - the root - perhaps your name server has
some problem with this?

>
> Any solutions or next steps i should try?

You can configure the agent/broker to log at DEBUG level, to see some
more details.

You can also change the network monitoring method, and/or configure
options for methods that do have them - e.g. 'tcp' and
'tcp_t_address', 'tcp_t_port'. See e.g.:

https://www.ovirt.org/documentation/administration_guide/index.html#Administering_the_Manager_Virtual_Machine
https://www.ovirt.org/develop/release-management/features/sla/hosted-engine-network-check.html

Best regards,
-- 
Didi
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2CNLAZZF2D5PGEGINZVQ4QV22J27BN65/

Reply via email to