Hi,
I have a hyperconverged ovirt 4.3.10 installation. One host is unresponsive.
This has happened in the past but most often resolved itself within a few
minutes. This time it is a different beast.
The host does not come back. Putting the host in maintenance is not possible
because "Host is unresponsive but has running virtual machines". Putting the
host into local maintenance on the local CLI does not help. Global maintenance
does not have any effect either. The host has been rebooted several times both
locally and via the web interface on the engine, which worked but did not
resolve the issue.
Logs:
/var/log/ovirt-engine/engine.log
2022-06-07 06:40:17,606+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default
task-34) [26f286ba-58d0-44be-96e0-f874057fe603] EVENT_ID:
GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host to Maintenance mode.
Host still has running VMs on it and is in Non Responsive state.
2022-06-07 06:40:17,606+02 WARN
[org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-34)
[26f286ba-58d0-44be-96e0-f874057fe603] Validation of action
'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons:
VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_VDS_IS_NOT_RESPONDING_WITH_VMS,$HostNotResponding
ovirt3.os-s.de
2022-06-07 06:41:23,974+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-33) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt3.os-s.de command Get Host
Capabilities failed: Message timeout which can be caused by communication issues
2022-06-07 06:41:23,974+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-33) [] Unable to
RefreshCapabilities: VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by communication issues
2022-06-07 06:44:47,033+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-4) [] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt3.os-s.de command Get Host
Capabilities failed: Message timeout which can be caused by communication issues
2022-06-07 06:44:47,034+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-4) [] Unable to
RefreshCapabilities: VDSNetworkException: VDSGenericException:
VDSNetworkException: Message timeout which can be caused by communication issues
vdsmd status:
Jun 06 08:51:09 ovirt3.os-s.de vdsm[3567]: WARN Worker blocked: <Worker
name=jsonrpc/1 running <Task <JsonRpcTask {'params': {}, 'jsonrpc': '2.0',
'method': u'Host.getStats', 'id': u'ce7ab5d1-3034-4b5a-91c5-9446e17341a2'} at
0x7fe1646563d0> timeout=60, duration=120.00 at 0x7fe1646562d0> task#=15 at
0x7fe165f47e50>, traceback:
File:
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File:
"/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File:
"/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args,
**self.__kwargs)
File:
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 260, in run
ret = func(*args, **kwargs)
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__
self._handler(self._ctx, self._req)
File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in
_serveRequest
response =
self._handle_request(req, ctx)
File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in
_handle_request
res = method(**params)
File:
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in
_dynamicMethod
result = fn(*methodArgs)
File: "<string>", line 2, in getStats
File:
"/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File:
"/usr/lib/python2.7/site-packages/vdsm/API.py", line 1435, in getStats
multipath=True)}
File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
ret['haStats'] = _getHaInfo()
File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo
stats =
instance.get_all_stats(timeout=5)
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 94, in get_all_stats
stats =
broker.get_stats_from_storage()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 146, in get_stats_from_storage
result = self._proxy.get_stats()
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name,
args)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host,
handler, request_body, verbose)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File:
"/usr/lib64/python2.7/httplib.py", line 1052, in endheaders
self._send_output(message_body)
File:
"/usr/lib64/python2.7/httplib.py", line 890, in _send_output
self.send(msg)
File:
"/usr/lib64/python2.7/httplib.py", line 852, in send
self.connect()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line
60, in connect
self.sock.connect(base64.b16decode(self.host))
File:
"/usr/lib64/python2.7/socket.py", line 224, in meth
return
getattr(self._sock,name)(*args)
Jun 06 08:51:14 ovirt3.os-s.de vdsm[3567]: WARN Worker blocked: <Worker
name=periodic/0 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor
object at 0x7fe16457ba90> at 0x7fe16457bad0> timeout=15, duration=120.00 at
0x7fe1644c3910> task#=22 at 0x7fe1646e4550>, traceback:
File:
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File:
"/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File:
"/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args,
**self.__kwargs)
File:
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 260, in run
ret = func(*args, **kwargs)
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File:
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 186, in __call__
self._func()
File:
"/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 481, in __call__
stats =
hostapi.get_stats(self._cif, self._samples.stats())
File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
ret['haStats'] = _getHaInfo()
File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo
stats =
instance.get_all_stats(timeout=5)
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 94, in get_all_stats
stats =
broker.get_stats_from_storage()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 146, in get_stats_from_storage
result = self._proxy.get_stats()
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name,
args)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
verbose=self.__verbose
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host,
handler, request_body, verbose)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File:
"/usr/lib64/python2.7/httplib.py", line 1052, in endheaders
self._send_output(message_body)
File:
"/usr/lib64/python2.7/httplib.py", line 890, in _send_output
self.send(msg)
File:
"/usr/lib64/python2.7/httplib.py", line 852, in send
self.connect()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line
60, in connect
self.sock.connect(base64.b16decode(self.host))
File:
"/usr/lib64/python2.7/socket.py", line 224, in meth
return
getattr(self._sock,name)(*args)
vdsmd log:
2022-06-07 05:02:13,717+0000 INFO (jsonrpc/3) [vdsm.api] FINISH
multipath_health return={} from=::1,54704,
task_id=169b447f-fd52-4a6f-9277-eff9870ffb81 (api:54)
2022-06-07 05:02:13,725+0000 INFO (jsonrpc/3) [api.host] FINISH getStats
return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics':
{'20': {'cpuUser': '0.07', 'nodeIndex': 1, 'cpuSys': '0.00', 'cpuIdle':
'99.93'}, '21': {'cpuUser': '0.00', 'nodeIndex': 1, 'cpuSys': '0.00',
'cpuIdle': '100.00'}, '22': {'cpuUser': '0.00', 'nodeIndex': 1, 'cpuSys':
'0.00', 'cpuIdle': '100.00'}, '23': {'cpuUser': '0.00', 'nodeIndex': 1,
'cpuSys': '0.07', 'cpuIdle': '99.93'}, '1': {'cpuUser': '0.87', 'nodeIndex': 0,
'cpuSys': '0.60', 'cpuIdle': '98.53'}, '0': {'cpuUser': '0.40', 'nodeIndex': 0,
'cpuSys': '0.27', 'cpuIdle': '99.33'}, '3': {'cpuUser': '0.27', 'nodeIndex': 0,
'cpuSys': '0.33', 'cpuIdle': '99.40'}, '2': {'cpuUser': '0.53', 'nodeIndex': 0,
'cpuSys': '0.40', 'cpuIdle': '99.07'}, '5': {'cpuUser': '0.33', 'nodeIndex': 0,
'cpuSys': '0.33', 'cpuIdle': '99.34'}, '4': {'cpuUser': '0.33', 'nodeIndex': 0,
'cpuSys': '0.40', 'cpuIdle': '99.27'}, '7': {'cpuUser': '0.13', 'nodeIndex': 1,
'cpuSys': '0.07', 'cpuIdle': '99.80'}, '6': {'cpuUser': '0.13', 'nodeIndex':
1, 'cpuSys': '0.07', 'cpuIdle': '99.80'}, '9': {'cpuUser': '0.13', 'nodeIndex':
1, 'cpuSys': '0.13', 'cpuIdle': '99.74'}, '8': {'cpuUser': '0.07', 'nodeIndex':
1, 'cpuSys': '0.13', 'cpuIdle': '99.80'}, '11': {'cpuUser': '0.20',
'nodeIndex': 1, 'cpuSys': '0.13', 'cpuIdle': '99.67'}, '10': {'cpuUser':
'0.20', 'nodeIndex': 1, 'cpuSys': '0.27', 'cpuIdle': '99.53'}, '13':
{'cpuUser': '0.80', 'nodeIndex': 0, 'cpuSys': '0.47', 'cpuIdle': '98.73'},
'12': {'cpuUser': '0.00', 'nodeIndex': 0, 'cpuSys': '0.00', 'cpuIdle':
'100.00'}, '15': {'cpuUser': '0.07', 'nodeIndex': 0, 'cpuSys': '0.00',
'cpuIdle': '99.93'}, '14': {'cpuUser': '0.20', 'nodeIndex': 0, 'cpuSys':
'0.27', 'cpuIdle': '99.53'}, '17': {'cpuUser': '0.07', 'nodeIndex': 0,
'cpuSys': '0.00', 'cpuIdle': '99.93'}, '16': {'cpuUser': '0.07', 'nodeIndex':
0, 'cpuSys': '0.00', 'cpuIdle': '99.93'}, '19': {'cpuUser': '1.46',
'nodeIndex': 1, 'cpuSys': '0.33', 'cpuIdle'
: '98.21'}, '18': {'cpuUser': '0.27', 'nodeIndex': 1, 'cpuSys': '0.07',
'cpuIdle': '99.66'}}, 'numaNodeMemFree': {'1': {'memPercent': 18, 'memFree':
'52935'}, '0': {'memPercent': 17, 'memFree': '53551'}}, 'memShared': 0,
'haScore': 0, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 3,
'memUsed': '3', 'storageDomains': {}, 'incomingVmMigrations': 0, 'network':
{'bond0': {'rxErrors': '0', 'txErrors': '0', 'speed': '2000', 'rxDropped': '4',
'name': 'bond0', 'tx': '84766340728', 'txDropped': '0', 'duplex': 'full',
'sampleTime': 1654578133.717023, 'rx': '102755561238', 'state': 'up'}, 'eno50':
{'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name':
'eno50', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime':
1654578133.717023, 'rx': '0', 'state': 'down'}, ';vdsmdummy;': {'rxErrors':
'0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;',
'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.71702
3, 'rx': '0', 'state': 'down'}, 'ovirtmgmt': {'rxErrors': '0', 'txErrors':
'0', 'speed': '1000', 'rxDropped': '122', 'name': 'ovirtmgmt', 'tx':
'81789466633', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime':
1654578133.717023, 'rx': '99028291927', 'state': 'up'}, 'lo': {'rxErrors': '0',
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx':
'10214215104', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime':
1654578133.717023, 'rx': '10214215104', 'state': 'up'}, 'ovs-system':
{'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name':
'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime':
1654578133.717023, 'rx': '0', 'state': 'down'}, 'eno49': {'rxErrors': '0',
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eno49', 'tx': '0',
'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx':
'0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed':
'1000', 'rxDropped': '0', 'na
me': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime':
1654578133.717023, 'rx': '0', 'state': 'down'}, 'eno1': {'rxErrors': '0',
'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eno1', 'tx': '0',
'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx':
'0', 'state': 'down'}, 'eno2': {'rxErrors': '0', 'txErrors': '0', 'speed':
'1000', 'rxDropped': '0', 'name': 'eno2', 'tx': '0', 'txDropped': '0',
'duplex': 'unknown', 'sampleTime': 1654578133.717023, 'rx': '0', 'state':
'down'}, 'eno3': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000',
'rxDropped': '1', 'name': 'eno3', 'tx': '42493183527', 'txDropped': '0',
'duplex': 'full', 'sampleTime': 1654578133.717023, 'rx': '20793574', 'state':
'up'}, 'eno4': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped':
'1', 'name': 'eno4', 'tx': '42273157201', 'txDropped': '0', 'duplex': 'full',
'sampleTime': 1654578133.717023, 'rx': '102734767664', 'state': 'up'},
'genev_sys_6081':
{'rxErrors': '0', 'txErrors': '8', 'speed': '1000', 'rxDropped': '0', 'name':
'genev_sys_6081', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown',
'sampleTime': 1654578133.717023, 'rx': '0', 'state': 'up'}}, 'txDropped':
'128', 'anonHugePages': '326', 'ksmPages': 100, 'elapsedTime': '391.12',
'cpuLoad': '0.16', 'cpuSys': '0.18', 'diskStats': {'/var/log': {'free':
'6478'}, '/var/run/vdsm/': {'free': '64294'}, '/tmp': {'free': '906'}},
'cpuUserVdsmd': '0.47', 'netConfigDirty': 'False', 'memCommitted': 28672,
'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 126975,
'bootTime': '1654505219', 'haStats': {'active': True, 'configured': True,
'score': 0, 'localMaintenance': True, 'globalMaintenance': False}, 'momStatus':
'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0,
'swapTotal': 32191, 'swapFree': 32191, 'hugepages': defaultdict(<type 'dict'>,
{1048576: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages':
0, 'surplus_hugepages
': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0},
2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0,
'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0,
'nr_hugepages_mempolicy': 0}}), 'dateTime': '2022-06-07T05:02:13 GMT',
'cpuUser': '0.27', 'memFree': 126719, 'cpuIdle': '99.55', 'vmActive': 0,
'v2vJobs': {}, 'cpuSysVdsmd': '0.13'}} from=::1,54704 (api:54)
2022-06-07 05:02:13,726+0000 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call
Host.getStats succeeded in 0.03 seconds (__init__:312)
2022-06-07 05:02:13,973+0000 INFO (vmrecovery) [vdsm.api] START
getConnectedStoragePoolsList(options=None) from=internal,
task_id=65629949-c74d-4f66-846a-1e2c40be9157 (api:48)
2022-06-07 05:02:13,973+0000 INFO (vmrecovery) [vdsm.api] FINISH
getConnectedStoragePoolsList return={'poollist': []} from=internal,
task_id=65629949-c74d-4f66-846a-1e2c40be9157 (api:54)
2022-06-07 05:02:13,973+0000 INFO (vmrecovery) [vds] recovery: waiting for
storage pool to go up (clientIF:723)
2022-06-07 05:02:14,468+0000 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2022-06-07 05:02:14,471+0000 INFO (jsonrpc/5) [vdsm.api] START
repoStats(domains=[u'5d6276cc-08ab-47f6-81e7-4e64aac3d386']) from=::1,54704,
task_id=6918f715-bbf4-4c7f-af00-1764294ab665 (api:48)
2022-06-07 05:02:14,472+0000 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats
return={} from=::1,54704, task_id=6918f715-bbf4-4c7f-af00-1764294ab665 (api:54)
2022-06-07 05:02:14,472+0000 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312)
Gluster volumes are all online.
systemctl status ovirt-ha-broker:
Jun 06 08:47:08 ovirt3.os-s.de systemd[1]: Started oVirt Hosted Engine High
Availability Communications Broker.
Jun 06 08:47:59 ovirt3.os-s.de ovirt-ha-broker[1524]: ovirt-ha-broker
mgmt_bridge.MgmtBridge ERROR Failed to getVdsStats: No 'network' in result
Any ideas?
Ralf
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/[email protected]/message/XT5WJCZKLIEDNI65IDTBY56VO7X7JOLT/