Hello Albert,
thanks, and sorry for the late response.
On Sept. 26 vdsmd went down on the host again. (didn't monitor properly, so,
only now, I realized it was down:)
This time I could not find any segvs in the logs.
When did it start?
We used until now, 4.3, and then upgraded first to 4.5.1 (through 4.4), as
described in the documentation. Until then, we never encountered crashes of the
vdsmd. Only ony after going to 4.5.2 and switching to compatibility level 4.7,
random crashed of the vdsmd happen.
The Ovirt engine, btw. is on a seperate Virtual machine, outside of the ovirt
cluster.
On the host ,/var/log/vdsm/vdsm.log:
I only can see, logging stoppped, and after restarting vdsmd, it started again.
Also, after restarting vdsmd, all Virtual machines on that host are shown with
the proper running state.
---<snipp>--
2022-09-26 11:52:25,161+0200 INFO (jsonrpc/6) [api.host] FINISH getAllVmStats
return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)}
from=::1,58732 (api:54)
2022-09-26 11:52:25,316+0200 INFO (periodic/2) [vdsm.api] START
repoStats(domains=()) from=internal,
task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:48)
2022-09-26 11:52:25,317+0200 INFO (periodic/2) [vdsm.api] FINISH repoStats
return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '4.6',
'delay': '0.000571996', 'valid': True, 'version': 5, 'a
cquired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e':
{'code': 0, 'lastCheck': '5.7', 'delay': '0.00073916', 'valid': True,
'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658
-af9d-0dfe11783916': {'code': 0, 'lastCheck': '4.4', 'delay': '0.00066827',
'valid': True, 'version': 5, 'acquired': True, 'actual': True},
'2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '4.4',
'delay': '0.000624771', 'valid': True, 'version': 5, 'acquired': True,
'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0,
'lastCheck': '1.8', 'delay': '0.000390569', 'valid': True, 'version':
5, 'acquired': True, 'actual': True}} from=internal,
task_id=cfb0ce21-b884-4fd9-baa6-d020107035fe (api:54)
2022-09-26 11:52:26,171+0200 INFO (jsonrpc/4) [vdsm.api] START
getSpmStatus(spUUID='5836aaac-0030-0064-024d-0000000002e4')
from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68fa928ec
(api:48)
2022-09-26 11:52:26,191+0200 INFO (jsonrpc/4) [vdsm.api] FINISH getSpmStatus
return={'spm_st': {'spmStatus': 'SPM', 'spmLver': 66, 'spmId': 2}}
from=::ffff:10.2.0.4,37000, task_id=d9d88470-3f04-4060-8d19-f9c68f
a928ec (api:54)
2022-09-26 11:52:26,218+0200 INFO (jsonrpc/5) [vdsm.api] START
getStoragePoolInfo(spUUID='5836aaac-0030-0064-024d-0000000002e4')
from=::ffff:10.2.0.4,54708, task_id=96aab09f-60ca-4ae7-be31-aa3070a08030
(api:48)
2022-09-26 11:52:26,240+0200 INFO (jsonrpc/5) [vdsm.api] FINISH
getStoragePoolInfo return={'info': {'domains':
'8008fa12-17c7-4d9f-b871-153bdd4a283e:Active,6f018fbd-de93-4c56-880d-8ede2aad2674:Active,2c870e06-6
c70-45ec-b665-ce29408c8a8e:Active,a15496dc-c241-4658-af9d-0dfe11783916:Active,41012bfb-b802-4092-b699-7f5284d95c8e:Active',
'isoprefix': '', 'lver': 66, 'master_uuid':
'41012bfb-b802-4092-b699-7f5284d95c8e', 'ma
ster_ver': 293, 'name': 'No Description', 'pool_status': 'connected', 'spm_id':
2, 'type': 'ISCSI', 'version': '5'}, 'dominfo':
{'8008fa12-17c7-4d9f-b871-153bdd4a283e': {'status': 'Active', 'alerts': [],
'isopre
fix': '', 'version': 5, 'disktotal': '879066349568', 'diskfree':
'863233900544'}, '6f018fbd-de93-4c56-880d-8ede2aad2674': {'status': 'Active',
'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '854966927
36', 'diskfree': '67779952640'}, '2c870e06-6c70-45ec-b665-ce29408c8a8e':
{'status': 'Active', 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal':
'536468258816', 'diskfree': '445602856960'}, 'a15496dc-c241
-4658-af9d-0dfe11783916': {'status': 'Active', 'alerts': [], 'isoprefix': '',
'version': 5, 'disktotal': '2683951906816', 'diskfree': '1178028998656'},
'41012bfb-b802-4092-b699-7f5284d95c8e': {'status': 'Active'
, 'alerts': [], 'isoprefix': '', 'version': 5, 'disktotal': '536468258816',
'diskfree': '450300477440'}}} from=::ffff:10.2.0.4,54708,
task_id=96aab09f-60ca-4ae7-be31-aa3070a08030 (api:54)
2022-09-26 11:52:29,273+0200 INFO (jsonrpc/1) [api.host] START getAllVmStats()
from=::ffff:10.2.0.4,37000 (api:48)
2022-09-26 11:52:29,302+0200 INFO (jsonrpc/1) [api.host] FINISH getAllVmStats
return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)}
from=::ffff:10.2.0.4,37000 (api:54)
2022-09-26 11:52:33,280+0200 INFO (jsonrpc/2) [api.host] START getStats()
from=::ffff:10.2.0.4,37000 (api:48)
2022-09-26 11:52:33,392+0200 INFO (jsonrpc/2) [vdsm.api] START
repoStats(domains=()) from=::ffff:10.2.0.4,37000,
task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:48)
2022-09-26 11:52:33,393+0200 INFO (jsonrpc/2) [vdsm.api] FINISH repoStats
return={'6f018fbd-de93-4c56-880d-8ede2aad2674': {'code': 0, 'lastCheck': '2.6',
'delay': '0.000660599', 'valid': True, 'version': 5, 'ac
quired': True, 'actual': True}, '41012bfb-b802-4092-b699-7f5284d95c8e':
{'code': 0, 'lastCheck': '3.7', 'delay': '0.00630966', 'valid': True,
'version': 5, 'acquired': True, 'actual': True}, 'a15496dc-c241-4658-
af9d-0dfe11783916': {'code': 0, 'lastCheck': '2.5', 'delay': '0.000574606',
'valid': True, 'version': 5, 'acquired': True, 'actual': True},
'2c870e06-6c70-45ec-b665-ce29408c8a8e': {'code': 0, 'lastCheck': '2.5',
'delay': '0.000632427', 'valid': True, 'version': 5, 'acquired': True,
'actual': True}, '8008fa12-17c7-4d9f-b871-153bdd4a283e': {'code': 0,
'lastCheck': '5.2', 'delay': '0.000390569', 'valid': True, 'version':
5, 'acquired': True, 'actual': True}} from=::ffff:10.2.0.4,37000,
task_id=a6e9dfd1-ed16-4178-b2b5-3396216e39cc (api:54)
2022-09-26 11:52:33,393+0200 INFO (jsonrpc/2) [vdsm.api] START
multipath_health() from=::ffff:10.2.0.4,37000,
task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:48)
2022-09-26 11:52:33,394+0200 INFO (jsonrpc/2) [vdsm.api] FINISH
multipath_health return={} from=::ffff:10.2.0.4,37000,
task_id=730e795b-ea87-4523-8711-caa27d7491fa (api:54)
2022-09-26 11:52:33,413+0200 INFO (jsonrpc/2) [api.host] FINISH getStats
return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)}
from=::ffff:10.2.0.4,37000 (api:54)
2022-10-01 07:03:26,112+0200 INFO (MainThread) [vds] (PID: 4130768) I am the
actual vdsm 4.50.2.2.1 vserv05 (4.18.0-372.26.1.el8_6.x86_64) (vdsmd:152)
2022-10-01 07:03:26,113+0200 INFO (MainThread) [vds] VDSM will run with cpu
affinity: frozenset({1}) (vdsmd:271)
2022-10-01 07:03:26,122+0200 INFO (MainThread) [storage.hsm] START HSM init
(hsm:217)
2022-10-01 07:03:26,125+0200 INFO (MainThread) [storage.hsm] Creating
data-center mount directory '/rhev/data-center/mnt' (hsm:222)
2022-10-01 07:03:26,125+0200 INFO (MainThread) [storage.fileutils] Creating
directory: /rhev/data-center/mnt mode: None (fileUtils:231)
2022-10-01 07:03:26,127+0200 INFO (MainThread) [storage.storagepool]
Unmounting master
/rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master
(sp:438)
2022-10-01 07:03:26,215+0200 INFO (MainThread) [storage.mount] unmounting
/rhev/data-center/mnt/blockSD/41012bfb-b802-4092-b699-7f5284d95c8e/master
(mount:215)
2022-10-01 07:03:26,338+0200 INFO (MainThread) [storage.hsm] Unlinking file
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/2c870e06-6c70-45ec-b665-ce29408c8a8e'
(hsm:344)
2022-10-01 07:03:26,339+0200 INFO (MainThread) [storage.hsm] Unlinking file
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/6f018fbd-de93-4c56-880d-8ede2aad2674'
(hsm:344)
2022-10-01 07:03:26,339+0200 INFO (MainThread) [storage.hsm] Unlinking file
'/rhev/data-center/5836aaac-0030-0064-024d-0000000002e4/a15496dc-c241-4658-af9d-0dfe11783916'
(hsm:344)
---<snapp>---
On Ovirt-engine, I can see, it cannot communicate with vdsmd, wich is not
really surprinsing, when the vdsmd is down.
---<snipp>---
2022-09-26 11:53:51,974+02 INFO
[org.ovirt.engine.core.bll.aaa.TerminateSessionsForTokenCommand] (default
task-237) [5ad44f73] Running command: TerminateSessionsForTokenCommand
internal: true.
2022-09-26 11:53:54,705+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:53:54,705+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
2022-09-26 11:53:54,707+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) []
Unable to RefreshCapabilities: ConnectExce
ption: Connection refused
2022-09-26 11:53:54,715+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) []
Command 'GetCapabilitiesAsy
ncVDSCommand(HostName = vserv05,
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed:
java.net.Connect
Exception: Connection refused
2022-09-26 11:53:56,246+02 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connection timeout for host '10.2.0.5', last response arrived 90001 ms ago.
2022-09-26 11:53:56,269+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51)
[3ba89774] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command
GetStoragePoolInfoVDS failed: Connection timeout for host '10.2.0.5', last
response arrived 90001 ms ago.
2022-09-26 11:53:56,270+02 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51)
[3ba89774] ERROR,
org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand,
exception: VDSGenericException: VDSNetworkException: Connection timeout for
host '10.2.0.5', last response arrived 90001 ms ago., log id: c571eb8
2022-09-26 11:53:56,273+02 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51)
[3ba89774] Exception VDSNetworkException: VDSGenericException:
VDSNetworkException: Connection timeout for host '10.2.0.5', last response
arrived 90001 ms ago.
2022-09-26 11:53:57,726+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:53:57,727+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
2022-09-26 11:53:57,732+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) []
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:53:57,734+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) []
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05,
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed:
java.net.ConnectException: Connection refused
2022-09-26 11:54:00,750+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:54:00,750+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
2022-09-26 11:54:00,754+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) []
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:54:00,758+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) []
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05,
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed:
java.net.ConnectException: Connection refused
2022-09-26 11:54:03,772+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:54:03,773+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
2022-09-26 11:54:03,780+02 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) []
Unable to RefreshCapabilities: ConnectException: Connection refused
2022-09-26 11:54:03,782+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) []
Command 'GetCapabilitiesAsyncVDSCommand(HostName = vserv05,
VdsIdAndVdsVDSCommandParametersBase:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
vds='Host[vserv05,75188aec-44ef-43b9-92b5-e5b06ae22ada]'})' execution failed:
java.net.ConnectException: Connection refused
2022-09-26 11:54:06,330+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:54:06,330+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
2022-09-26 11:54:06,333+02 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-92) []
Command 'SpmStatusVDSCommand(HostName = vserv05,
SpmStatusVDSCommandParameters:{hostId='75188aec-44ef-43b9-92b5-e5b06ae22ada',
storagePoolId='5836aaac-0030-0064-024d-0000000002e4'})' execution failed:
java.net.ConnectException: Connection refused
2022-09-26 11:54:06,517+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connecting to /10.2.0.5
2022-09-26 11:54:06,517+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) []
Connected to /10.2.0.5:54321
---<snapp>----
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/[email protected]/message/5BA5HV57DO77Y5EN5SPMDN5CRKCYPCFB/