Hello,

Have an issue that feels sanlock related, but I can't get sorted with our 
installation. This is 4.2.1, hosted engine. One of our hosts is stuck in a 
loop. It:

 - gets a VDSM GetStatsVDS timeout, is marked as down, 
 - throws a warning about not being fenced (because that's not enabled yet, 
because of this problem).
 - and is set up Up about a minute later.

This repeats every 4 minutes and 20 seconds.

The hosted engine is running on the host that is stuck like this, and it 
doesn't appear to get in the way of creating new VMs or other operations, but 
obviously I can't use fencing, which is a big part of the point of running 
Ovirt in the first place.

I tried setting global maintenance and running hosted-engine 
--reinitialize-lockspace, which (a) took nearly exactly 2 minutes to run, 
making me think something timed out, (b) exited with rc 0, and (c) didn't fix 
the problem.

Anyone have an idea of how to fix this?

-j



- - details - -

I still can't quite figure out how to interpret what sanlock says, but  the -1s 
look like wrongness.

[sc5-ovirt-1]# sanlock client status
daemon bedae69e-03cc-49f8-88f4-9674a85a3185.sc5-ovirt-
p -1 helper
p -1 listener
p 122268 HostedEngine
p -1 status
s 
1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd:1:/rhev/data-center/mnt/glusterSD/172.16.0.151\:_sc5-images/1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd/dom_md/ids:0
s 
b41eb20a-eafb-481b-9a50-a135cf42b15e:1:/rhev/data-center/mnt/glusterSD/sc5-gluster-10g-1\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/dom_md/ids:0
r 
b41eb20a-eafb-481b-9a50-a135cf42b15e:8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87:/rhev/data-center/mnt/glusterSD/172.16.0.153\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/images/a9d01d59-f146-47e5-b514-d10f8867678e/8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87.lease:0:5
 p 122268


engine.log:

2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM sc5-ovirt-1 command GetStatsVDS 
failed: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Command 
'GetStatsVDSCommand(HostName = sc5-ovirt-1, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'})' execution 
failed: VDSGenericException: VDSNetworkException: Message timeout which can be 
caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed getting vds 
stats, host='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Message timeout which can be caused 
by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failure to refresh host 
'sc5-ovirt-1' runtime info: VDSGenericException: VDSNetworkException: Message 
timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed to refresh VDS, 
network error, continuing, 
vds='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] Host 'sc5-ovirt-1' is not 
responding.
2018-03-21 16:09:26,088-07 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] EVENT_ID: 
VDS_HOST_NOT_RESPONDING(9,027), Host sc5-ovirt-1 is not responding. Host cannot 
be fenced automatically because power management for the host is disabled.
2018-03-21 16:09:27,070-07 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to sc5-ovirt-1/10.181.26.129
2018-03-21 16:09:27,918-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] START, 
GlusterServersListVDSCommand(HostName = sc5-gluster-2, 
VdsIdVDSCommandParametersBase:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}), 
log id: 6afccc01
2018-03-21 16:09:28,579-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterServersListVDSCommand, 
return: [192.168.122.1/24:CONNECTED, sc5-gluster-3:CONNECTED, 
sc5-gluster-10g-1:CONNECTED], log id: 6afccc01
2018-03-21 16:09:28,606-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] START, 
GlusterVolumesListVDSCommand(HostName = sc5-gluster-2, 
GlusterVolumesListVDSParameters:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}),
 log id: 44e90100
2018-03-21 16:09:29,015-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterVolumesListVDSCommand, 
return: 
{6fe949b5-894a-4843-b3e4-af81545574dc=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@140a4a60,
 
bc29ba89-8fc0-494d-9fe5-bc7b34396b65=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@29637467},
 log id: 44e90100
2018-03-21 16:09:29,686-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [] START, 
GetHardwareInfoVDSCommand(HostName = sc5-ovirt-1, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'}), log id: 
6b1cb74b
2018-03-21 16:09:29,692-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [] FINISH, 
GetHardwareInfoVDSCommand, log id: 6b1cb74b
2018-03-21 16:09:29,900-07 INFO  
[org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [576fddcc] Running command: 
HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  
ID: be3517e0-f79d-464c-8169-f786d13ac287 Type: VDS
2018-03-21 16:09:29,944-07 INFO  [org.ovirt.engine.core.bll.InitVdsOnUpCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [26c5f844] Running command: 
InitVdsOnUpCommand internal: true. Entities affected :  ID: 
c4e2ca40-1e72-11e8-beac-00163e0994d8 Type: StoragePool
2018-03-21 16:09:29,977-07 INFO  
[org.ovirt.engine.core.bll.storage.pool.ConnectHostToStoragePoolServersCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] Running command: 
ConnectHostToStoragePoolServersCommand internal: true. Entities affected :  ID: 
c4e2ca40-1e72-11e8-beac-00163e0994d8 Type: StoragePool
2018-03-21 16:09:30,002-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] START, 
ConnectStorageServerVDSCommand(HostName = sc5-ovirt-1, 
StorageServerConnectionManagementVDSParameters:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 storagePoolId='c4e2ca40-1e72-11e8-beac-00163e0994d8', storageType='GLUSTERFS', 
connectionList='[StorageServerConnections:{id='0e2e93f1-3904-4d70-82aa-16bcc83ea314',
 connection='172.16.0.153:/sc5-ovirt_engine', iqn='null', vfsType='glusterfs', 
mountOptions='backup-volfile-servers=172.16.0.152:172.16.0.151', 
nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', 
netIfaceName='null'}, 
StorageServerConnections:{id='26c9dbd8-f550-4b7a-9f84-3e905f1a00db', 
connection='172.16.0.151:/sc5-images', iqn='null', vfsType='glusterfs', 
mountOptions='backup-volfile-servers=172.16.0.152:172.16.0.153', 
nfsVersion='null', nfsRetrans='null', nfsTimeo='null', if
 ace='null', netIfaceName='null'}]'}), log id: acd504a
2018-03-21 16:09:30,099-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] FINISH, 
ConnectStorageServerVDSCommand, return: 
{26c9dbd8-f550-4b7a-9f84-3e905f1a00db=0, 
0e2e93f1-3904-4d70-82aa-16bcc83ea314=0}, log id: acd504a
2018-03-21 16:09:30,107-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] START, 
ConnectStorageServerVDSCommand(HostName = sc5-ovirt-1, 
StorageServerConnectionManagementVDSParameters:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 storagePoolId='c4e2ca40-1e72-11e8-beac-00163e0994d8', storageType='NFS', 
connectionList='[StorageServerConnections:{id='2239cb49-a8bb-49ee-9a5a-90d72c4602d0',
 connection='sc5-archive-10g-1:/var/ovirt/ovirt_iso_new', iqn='null', 
vfsType='null', mountOptions='null', nfsVersion='AUTO', nfsRetrans='null', 
nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 35528d0f
2018-03-21 16:09:30,099-07 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-40) [41e6da49] FINISH, 
ConnectStorageServerVDSCommand, return: 
{26c9dbd8-f550-4b7a-9f84-3e905f1a00db=0, 
0e2e93f1-3904-4d70-82aa-16bcc83ea314=0}, log id: acd504a
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to