Is your issue with the host certificates or the engine ?
You can try to set a node in maintenance (or at least try that) and then try to 
reenroll the certificate from the UI.
Best Regards,Strahil Nikolov
 
 
  On Sat, Feb 19, 2022 at 9:48, Joseph Gelinas<[email protected]> wrote:   I 
believe I ran `hosted-engine --deploy` on ovirt-1 to see if there was an option 
to reenroll that way, but when it prompted and asked if it was really what I 
wanted to do I ctrl-D or said no and it ran something anyways, so I ctrl-C out 
of it and maybe that is what messed up vdsm on that node. Not sure about 
ovirt-3, is there a way to fix that?

> On Feb 18, 2022, at 17:21, Joseph Gelinas <[email protected]> wrote:
> 
> Unfortunately ovirt-ha-broker & ovirt-ha-agent are just in continual restart 
> loops on ovirt-1 & ovirt-3 (ovirt-engine is currently on ovirt-3).
> 
> The output for broker.log:
> 
> MainThread::ERROR::2022-02-18 
> 22:08:58,101::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) 
> Trying to restart the broker
> MainThread::INFO::2022-02-18 
> 22:08:58,453::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) 
> ovirt-hosted-engine-ha broker 2.4.5 started
> MainThread::INFO::2022-02-18 
> 22:09:00,456::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Searching for submonitors in 
> /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors
> MainThread::INFO::2022-02-18 
> 22:09:00,456::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor mem-free
> MainThread::INFO::2022-02-18 
> 22:09:00,457::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor engine-health
> MainThread::INFO::2022-02-18 
> 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor cpu-load-no-engine
> MainThread::INFO::2022-02-18 
> 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor mgmt-bridge
> MainThread::INFO::2022-02-18 
> 22:09:00,459::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor network
> MainThread::INFO::2022-02-18 
> 22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor storage-domain
> MainThread::INFO::2022-02-18 
> 22:09:00,460::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Loaded submonitor cpu-load
> MainThread::INFO::2022-02-18 
> 22:09:00,460::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>  Finished loading submonitors
> MainThread::WARNING::2022-02-18 
> 22:10:00,788::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>  Can't connect vdsm storage: Couldn't  connect to VDSM within 60 seconds 
> MainThread::ERROR::2022-02-18 
> 22:10:00,788::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) 
> Failed initializing the broker: Couldn't  connect to VDSM within 60 seconds
> MainThread::ERROR::2022-02-18 
> 22:10:00,789::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) 
> Traceback (most recent call last):
>  File 
>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", 
>line 64, in run
>    self._storage_broker_instance = self._get_storage_broker()
>  File 
>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", 
>line 143, in _get_storage_broker
>    return storage_broker.StorageBroker()
>  File 
>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 97, in __init__
>    self._backend.connect()
>  File 
>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
> line 370, in connect
>    connection = util.connect_vdsm_json_rpc(logger=self._logger)
>  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
>line 472, in connect_vdsm_json_rpc
>    __vdsm_json_rpc_connect(logger, timeout)
>  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
>line 415, in __vdsm_json_rpc_connect
>    timeout=VDSM_MAX_RETRY * VDSM_DELAY
> RuntimeError: Couldn't  connect to VDSM within 60 seconds
> 
> 
> vdsm.log:
> 
> 2022-02-18 22:14:43,939+0000 INFO  (vmrecovery) [vds] recovery: waiting for 
> storage pool to go up (clientIF:726)
> 2022-02-18 22:14:44,071+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48832 
> (protocoldetector:61)
> 2022-02-18 22:14:44,074+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:44,442+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48836 
> (protocoldetector:61)
> 2022-02-18 22:14:44,445+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:45,077+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48838 
> (protocoldetector:61)
> 2022-02-18 22:14:45,435+0000 INFO  (periodic/2) [vdsm.api] START 
> repoStats(domains=()) from=internal, 
> task_id=2dd417e7-0f4f-4a09-a1af-725f267af135 (api:48)
> 2022-02-18 22:14:45,435+0000 INFO  (periodic/2) [vdsm.api] FINISH repoStats 
> return={} from=internal, task_id=2dd417e7-0f4f-4a09-a1af-725f267af135 (api:54)
> 2022-02-18 22:14:45,438+0000 WARN  (periodic/2) [root] Failed to retrieve 
> Hosted Engine HA info, is Hosted Engine setup finished? (api:194)
> 2022-02-18 22:14:45,447+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48840 
> (protocoldetector:61)
> 2022-02-18 22:14:45,449+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:46,082+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48842 
> (protocoldetector:61)
> 2022-02-18 22:14:46,084+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:46,452+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48844 
> (protocoldetector:61)
> 2022-02-18 22:14:46,455+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:47,087+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48846 
> (protocoldetector:61)
> 2022-02-18 22:14:47,089+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:47,457+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48848 
> (protocoldetector:61)
> 2022-02-18 22:14:47,459+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:48,092+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48850 
> (protocoldetector:61)
> 2022-02-18 22:14:48,094+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:48,461+0000 INFO  (Reactor thread) 
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:48852 
> (protocoldetector:61)
> 2022-02-18 22:14:48,464+0000 ERROR (Reactor thread) 
> [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: 
> ::1 (sslutils:269)
> 2022-02-18 22:14:48,941+0000 INFO  (vmrecovery) [vdsm.api] START 
> getConnectedStoragePoolsList(options=None) from=internal, 
> task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:48)
> 2022-02-18 22:14:48,942+0000 INFO  (vmrecovery) [vdsm.api] FINISH 
> getConnectedStoragePoolsList return={'poollist': []} from=internal, 
> task_id=75ef5d5f-c56b-4595-95c8-3dc64caa3a83 (api:54)
> 
> 
> 
>> On Feb 18, 2022, at 16:35, Strahil Nikolov via Users <[email protected]> wrote:
>> 
>> ovirt-2 is 'state=GlobalMaintenance' , but the other 2 nodes is uknown.
>> Try to start ovirt-ha-broker & ovirt-ha-agent
>> 
>> Also, you may try to move the hosted-engine to ovirt-2 and try again
>> 
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> On Fri, Feb 18, 2022 at 21:48, Joseph Gelinas
>> <[email protected]> wrote:
>> I may be in maintenance mode, I did try to set it in the beginning of this, 
>> but engine-setup doesn't see it. At this point my nodes say they can't 
>> connect to the HA daemon, or have stale data.
>> 
>> [root@ovirt-1 ~]# hosted-engine --set-maintenance --mode=global
>> Cannot connect to the HA daemon, please check the logs.
>> 
>> [root@ovirt-3 ~]# hosted-engine --set-maintenance --mode=global
>> Cannot connect to the HA daemon, please check the logs.
>> 
>> [root@ovirt-2 ~]# hosted-engine --set-maintenance --mode=global
>> [root@ovirt-2 ~]# hosted-engine --vm-status
>> 
>> 
>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>> 
>> 
>> 
>> --== Host ovirt-1.xxxxxx.com (id: 1) status ==--
>> 
>> Host ID                            : 1
>> Host timestamp                    : 6750990
>> Score                              : 0
>> Engine status                      : unknown stale-data
>> Hostname                          : ovirt-1.xxxxxx.com
>> Local maintenance                  : False
>> stopped                            : True
>> crc32                              : 5290657b
>> conf_on_shared_storage            : True
>> local_conf_timestamp              : 6750950
>> Status up-to-date                  : False
>> Extra metadata (valid at timestamp):
>>    metadata_parse_version=1
>>    metadata_feature_version=1
>>    timestamp=6750990 (Thu Feb 17 22:17:53 2022)
>>    host-id=1
>>    score=0
>>    vm_conf_refresh_time=6750950 (Thu Feb 17 22:17:12 2022)
>>    conf_on_shared_storage=True
>>    maintenance=False
>>    state=AgentStopped
>>    stopped=True
>> 
>> 
>> --== Host ovirt-3.xxxxxx.com (id: 2) status ==--
>> 
>> Host ID                            : 2
>> Host timestamp                    : 6731526
>> Score                              : 0
>> Engine status                      : unknown stale-data
>> Hostname                          : ovirt-3.xxxxxx.com
>> Local maintenance                  : False
>> stopped                            : True
>> crc32                              : 12c6b5c9
>> conf_on_shared_storage            : True
>> local_conf_timestamp              : 6731486
>> Status up-to-date                  : False
>> Extra metadata (valid at timestamp):
>>    metadata_parse_version=1
>>    metadata_feature_version=1
>>    timestamp=6731526 (Thu Feb 17 15:29:37 2022)
>>    host-id=2
>>    score=0
>>    vm_conf_refresh_time=6731486 (Thu Feb 17 15:28:57 2022)
>>    conf_on_shared_storage=True
>>    maintenance=False
>>    state=AgentStopped
>>    stopped=True
>> 
>> 
>> --== Host ovirt-2.xxxxxx.com (id: 3) status ==--
>> 
>> Host ID                            : 3
>> Host timestamp                    : 6829853
>> Score                              : 3400
>> Engine status                      : {"vm": "down", "health": "bad", 
>> "detail": "unknown", "reason": "vm not running on this host"}
>> Hostname                          : ovirt-2.xxxxxx.com
>> Local maintenance                  : False
>> stopped                            : False
>> crc32                              : 0779c0b8
>> conf_on_shared_storage            : True
>> local_conf_timestamp              : 6829853
>> Status up-to-date                  : True
>> Extra metadata (valid at timestamp):
>>    metadata_parse_version=1
>>    metadata_feature_version=1
>>    timestamp=6829853 (Fri Feb 18 19:25:17 2022)
>>    host-id=3
>>    score=3400
>>    vm_conf_refresh_time=6829853 (Fri Feb 18 19:25:17 2022)
>>    conf_on_shared_storage=True
>>    maintenance=False
>>    state=GlobalMaintenance
>>    stopped=False
>> 
>> 
>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>> 
>> 
>> Ovirt-ha-agent on 1&3 just keeps trying to restart:
>> 
>> MainThread::ERROR::2022-02-18 
>> 19:34:36,910::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>  Trying to restart agent
>> MainThread::INFO::2022-02-18 
>> 19:34:36,910::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
>> Agent shutting down
>> MainThread::INFO::2022-02-18 
>> 19:34:47,268::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
>> ovirt-hosted-engine-ha agent 2.4.5 started
>> MainThread::INFO::2022-02-18 
>> 19:34:47,280::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>  Certificate common name not found, using hostname to identify host
>> MainThread::ERROR::2022-02-18 
>> 19:35:47,629::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>>  Traceback (most recent call last):
>>  File 
>>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
>>line 131, in _run_agent
>>    return action(he)
>>  File 
>>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
>>line 55, in action_proper
>>    return he.start_monitoring()
>>  File 
>>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 436, in start_monitoring
>>    self._initialize_vdsm()
>>  File 
>>"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 595, in _initialize_vdsm
>>    logger=self._log
>>  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
>>line 472, in connect_vdsm_json_rpc
>>    __vdsm_json_rpc_connect(logger, timeout)
>>  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
>>line 415, in __vdsm_json_rpc_connect
>>    timeout=VDSM_MAX_RETRY * VDSM_DELAY
>> RuntimeError: Couldn't  connect to VDSM within 60 seconds
>> 
>> 
>> Ovirt-2's ovirt-hosted-engine-ha/agent.log has entries detecting global 
>> maintenance though `systemctl status ovirt-ha-agent` has python exception 
>> errors from yesterday.
>> 
>> MainThread::INFO::2022-02-18 
>> 19:39:10,452::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>>  Global maintenance detected
>> MainThread::INFO::2022-02-18 
>> 19:39:10,524::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
>>  Current state GlobalMaintenance (score: 3400)
>> 
>> 
>> Feb 17 18:49:12 ovirt-2.us1.vricon.com python3[1324125]: detected unhandled 
>> Python exception in 
>> '/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py'
>> 
>> 
>> 
>>> On Feb 18, 2022, at 14:20, Strahil Nikolov <[email protected]> wrote:
>>> 
>>> To set the engine into maintenance mode you can ssh to any Hypervisor and 
>>> run:
>>> 'hosted-engine --set-maintenance --mode=global'
>>> wait 1 minute and run 'hosted-engine --vm-status' to validate.
>>> 
>>> Best Regards,
>>> Strahil Nikolov
>>> 
>>> On Fri, Feb 18, 2022 at 19:03, Joseph Gelinas
>>> <[email protected]> wrote:
>>> Hi,
>>> 
>>> The certificates on our oVirt stack recently expired, while all the VMs are 
>>> still up, I can't put the cluster into global maintenance via ovirt-engine, 
>>> or do anything via ovirt-engine for that matter. Just get event logs about 
>>> cert validity.
>>> 
>>> VDSM ovirt-1.xxxxx.com command Get Host Capabilities failed: PKIX path 
>>> validation failed: java.security.cert.CertPathValidatorException: validity 
>>> check failed
>>> VDSM ovirt-2.xxxxx.com command Get Host Capabilities failed: PKIX path 
>>> validation failed: java.security.cert.CertPathValidatorException: validity 
>>> check failed
>>> VDSM ovirt-3.xxxxx.com command Get Host Capabilities failed: PKIX path 
>>> validation failed: java.security.cert.CertPathValidatorException: validity 
>>> check failed
>>> 
>>> Under Compute -> Hosts, all are status Unassigned. Default data center is 
>>> status Non Responsive.
>>> 
>>> I have tried a couple of solutions to regenerate the certificates without 
>>> much luck and have copied the originals back in place.
>>> 
>>> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/upgrade_guide/replacing_sha-1_certificates_with_sha-256_certificates_4-1_local_db#Replacing_All_Signed_Certificates_with_SHA-256_4-1_local_db
>>> 
>>> https://access.redhat.com/solutions/2409751
>>> 
>>> 
>>> I have seen things saying running engine-setup will generate new certs, 
>>> however engine doesn't think the cluster is in global maintenance so won't 
>>> run that, I believe I can get around the check with `engine-setup 
>>> --otopi-environment=OVESETUP_CONFIG/continueSetupOnHEVM=bool:True` but is 
>>> that the right thing to do? Will it deploy the certs on to the hosts as 
>>> well so things communicate properly? Looks like one is supposed to put a 
>>> node into maintenance and reenroll it after doing the engine-setup, but 
>>> will it even be able to put the nodes into maintenance given I can't do 
>>> anything with them now?
>>> 
>>> Appreciate any ideas.
>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct: 
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives: 
>>> https://lists.ovirt.org/archives/list/[email protected]/message/QCFPKQ3OKPOUV266MFJUMVTNG2OHLJVW/
>> _______________________________________________
>> Users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/[email protected]/message/XOQBFYM5W7SCJISJHQ7PZZ3VZWKY6GEZ/
>> 
>> _______________________________________________
>> Users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/[email protected]/message/NZE5DYLGQEFQ523MBSUDLKRSFIH7DP62/
> 
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/477NW53FXLCUFGVHEA733FIXKCJ2ZNGN/
  
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/AR4WMOIZNAIPZDTCU4JELY6TWZDA27KT/

Reply via email to