Good morning, Info requested below.
[root@ovirt-hyp-02 ~]# hosted-engine --vm-start Exception in thread Client localhost:54321 (most likely raised during interpreter shutdown):VM exists and its status is Up [root@ovirt-hyp-02 ~]# ping engine PING engine.example.lan (192.168.170.149) 56(84) bytes of data. >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination Host Unreachable >From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination Host Unreachable [root@ovirt-hyp-02 ~]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick 192.168.170.141:/gluster_bricks/engin e/engine 49159 0 Y 1799 Brick 192.168.170.143:/gluster_bricks/engin e/engine 49159 0 Y 2900 Self-heal Daemon on localhost N/A N/A Y 2914 Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A Y 1854 Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks [root@ovirt-hyp-02 ~]# gluster volume heal engine info Brick 192.168.170.141:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick 192.168.170.143:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick 192.168.170.147:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 [root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data- center-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing Each of the three host sends out the following notifications about every 15 minutes. Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineDown-EngineStart. Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStart-EngineStarting. Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineStarting- EngineForceStop. Hosted engine host: ovirt-hyp-01.example.lan changed state: EngineForceStop-EngineDown. Please let me know if you need any additional information. Thank you, Joel On Jun 16, 2017 2:52 AM, "Sahina Bose" <[email protected]> wrote: > From the agent.log, > MainThread::INFO::2017-06-15 11:16:50,583::states::473:: > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine > vm is running on host ovirt-hyp-02.reis.com (id 2) > > It looks like the HE VM was started successfully? Is it possible that the > ovirt-engine service could not be started on the HE VM. Could you try to > start the HE vm using below and then logging into the VM console. > #hosted-engine --vm-start > > Also, please check > # gluster volume status engine > # gluster volume heal engine info > > Please also check if there are errors in gluster mount logs - at > /var/log/glusterfs/rhev-data-center-mnt..<engine>.log > > > On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <[email protected]> wrote: > >> Sorry. I forgot to attached the requested logs in the previous email. >> >> Thanks, >> >> On Jun 15, 2017 9:38 AM, "Joel Diaz" <[email protected]> wrote: >> >> Good morning, >> >> Requested info below. Along with some additional info. >> >> You'll notice the data volume is not mounted. >> >> Any help in getting HE back running would be greatly appreciated. >> >> Thank you, >> >> Joel >> >> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status >> >> >> >> >> >> --== Host 1 status ==-- >> >> >> >> conf_on_shared_storage : True >> >> Status up-to-date : False >> >> Hostname : ovirt-hyp-01.example.lan >> >> Host ID : 1 >> >> Engine status : unknown stale-data >> >> Score : 3400 >> >> stopped : False >> >> Local maintenance : False >> >> crc32 : 5558a7d3 >> >> local_conf_timestamp : 20356 >> >> Host timestamp : 20341 >> >> Extra metadata (valid at timestamp): >> >> metadata_parse_version=1 >> >> metadata_feature_version=1 >> >> timestamp=20341 (Fri Jun 9 14:38:57 2017) >> >> host-id=1 >> >> score=3400 >> >> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017) >> >> conf_on_shared_storage=True >> >> maintenance=False >> >> state=EngineDown >> >> stopped=False >> >> >> >> >> >> --== Host 2 status ==-- >> >> >> >> conf_on_shared_storage : True >> >> Status up-to-date : False >> >> Hostname : ovirt-hyp-02.example.lan >> >> Host ID : 2 >> >> Engine status : unknown stale-data >> >> Score : 3400 >> >> stopped : False >> >> Local maintenance : False >> >> crc32 : 936d4cf3 >> >> local_conf_timestamp : 20351 >> >> Host timestamp : 20337 >> >> Extra metadata (valid at timestamp): >> >> metadata_parse_version=1 >> >> metadata_feature_version=1 >> >> timestamp=20337 (Fri Jun 9 14:39:03 2017) >> >> host-id=2 >> >> score=3400 >> >> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017) >> >> conf_on_shared_storage=True >> >> maintenance=False >> >> state=EngineDown >> >> stopped=False >> >> >> >> >> >> --== Host 3 status ==-- >> >> >> >> conf_on_shared_storage : True >> >> Status up-to-date : False >> >> Hostname : ovirt-hyp-03.example.lan >> >> Host ID : 3 >> >> Engine status : unknown stale-data >> >> Score : 3400 >> >> stopped : False >> >> Local maintenance : False >> >> crc32 : f646334e >> >> local_conf_timestamp : 20391 >> >> Host timestamp : 20377 >> >> Extra metadata (valid at timestamp): >> >> metadata_parse_version=1 >> >> metadata_feature_version=1 >> >> timestamp=20377 (Fri Jun 9 14:39:37 2017) >> >> host-id=3 >> >> score=3400 >> >> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017) >> >> conf_on_shared_storage=True >> >> maintenance=False >> >> state=EngineStop >> >> stopped=False >> >> timeout=Thu Jan 1 00:43:08 1970 >> >> >> >> >> >> [root@ovirt-hyp-01 ~]# gluster peer status >> >> Number of Peers: 2 >> >> >> >> Hostname: 192.168.170.143 >> >> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5 >> >> State: Peer in Cluster (Connected) >> >> Other names: >> >> 10.0.0.2 >> >> >> >> Hostname: 192.168.170.147 >> >> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529 >> >> State: Peer in Cluster (Connected) >> >> Other names: >> >> 10.0.0.3 >> >> >> >> [root@ovirt-hyp-01 ~]# gluster volume info all >> >> >> >> Volume Name: data >> >> Type: Replicate >> >> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: 192.168.170.141:/gluster_bricks/data/data >> >> Brick2: 192.168.170.143:/gluster_bricks/data/data >> >> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter) >> >> Options Reconfigured: >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> performance.quick-read: off >> >> performance.read-ahead: off >> >> performance.io-cache: off >> >> performance.stat-prefetch: off >> >> performance.low-prio-threads: 32 >> >> network.remote-dio: off >> >> cluster.eager-lock: enable >> >> cluster.quorum-type: auto >> >> cluster.server-quorum-type: server >> >> cluster.data-self-heal-algorithm: full >> >> cluster.locking-scheme: granular >> >> cluster.shd-max-threads: 8 >> >> cluster.shd-wait-qlength: 10000 >> >> features.shard: on >> >> user.cifs: off >> >> storage.owner-uid: 36 >> >> storage.owner-gid: 36 >> >> network.ping-timeout: 30 >> >> performance.strict-o-direct: on >> >> cluster.granular-entry-heal: enable >> >> >> >> Volume Name: engine >> >> Type: Replicate >> >> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: 192.168.170.141:/gluster_bricks/engine/engine >> >> Brick2: 192.168.170.143:/gluster_bricks/engine/engine >> >> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter) >> >> Options Reconfigured: >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> performance.quick-read: off >> >> performance.read-ahead: off >> >> performance.io-cache: off >> >> performance.stat-prefetch: off >> >> performance.low-prio-threads: 32 >> >> network.remote-dio: off >> >> cluster.eager-lock: enable >> >> cluster.quorum-type: auto >> >> cluster.server-quorum-type: server >> >> cluster.data-self-heal-algorithm: full >> >> cluster.locking-scheme: granular >> >> cluster.shd-max-threads: 8 >> >> cluster.shd-wait-qlength: 10000 >> >> features.shard: on >> >> user.cifs: off >> >> storage.owner-uid: 36 >> >> storage.owner-gid: 36 >> >> network.ping-timeout: 30 >> >> performance.strict-o-direct: on >> >> cluster.granular-entry-heal: enable >> >> >> >> >> >> [root@ovirt-hyp-01 ~]# df -h >> >> Filesystem Size Used Avail Use% >> Mounted on >> >> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% / >> >> devtmpfs 7.7G 0 7.7G 0% /dev >> >> tmpfs 7.8G 0 7.8G 0% >> /dev/shm >> >> tmpfs 7.8G 8.7M 7.7G 1% /run >> >> tmpfs 7.8G 0 7.8G 0% >> /sys/fs/cgroup >> >> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1% /home >> >> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16% >> /gluster_bricks/engine >> >> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22% >> /gluster_bricks/data >> >> /dev/sda1 497M 173M 325M 35% /boot >> >> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16% >> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine >> >> tmpfs 1.6G 0 1.6G 0% >> /run/user/0 >> >> >> >> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt >> >> ovirt-ha-agent.service enabled >> >> ovirt-ha-broker.service enabled >> >> ovirt-imageio-daemon.service disabled >> >> ovirt-vmconsole-host-sshd.service enabled >> >> >> >> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service >> >> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >> Monitoring Agent >> >> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >> enabled; vendor preset: disabled) >> >> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago >> >> Main PID: 3150 (ovirt-ha-agent) >> >> CGroup: /system.slice/ovirt-ha-agent.service >> >> └─3150 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >> --no-daemon >> >> >> >> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted >> Engine High Availability Monitoring Agent. >> >> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >> Hosted Engine High Availability Monitoring Agent... >> >> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]: >> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >> ERROR Engine VM stopped on localhost >> >> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >> >> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >> Communications Broker >> >> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >> enabled; vendor preset: disabled) >> >> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago >> >> Main PID: 968 (ovirt-ha-broker) >> >> CGroup: /system.slice/ovirt-ha-broker.service >> >> └─968 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >> --no-daemon >> >> >> >> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted >> Engine High Availability Communications Broker. >> >> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >> Hosted Engine High Availability Communications Broker... >> >> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]: >> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler >> ERROR Error handling request, data: '...1b55bcf76' >> >> Traceback >> (most recent call last): >> >> File >> "/usr/lib/python2.7/site-packages/ovirt... >> >> Hint: Some lines were ellipsized, use -l to show in full. >> >> >> >> >> >> >> >> >> >> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service >> >> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service >> >> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >> Monitoring Agent >> >> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >> enabled; vendor preset: disabled) >> >> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago >> >> Main PID: 8563 (ovirt-ha-agent) >> >> CGroup: /system.slice/ovirt-ha-agent.service >> >> └─8563 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >> --no-daemon >> >> >> >> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted >> Engine High Availability Monitoring Agent. >> >> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >> Hosted Engine High Availability Monitoring Agent... >> >> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service >> >> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >> >> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >> Communications Broker >> >> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >> enabled; vendor preset: disabled) >> >> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago >> >> Main PID: 8844 (ovirt-ha-broker) >> >> CGroup: /system.slice/ovirt-ha-broker.service >> >> └─8844 /usr/bin/python >> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >> --no-daemon >> >> >> >> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt Hosted >> Engine High Availability Communications Broker. >> >> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >> Hosted Engine High Availability Communications Broker... >> >> >> On Jun 14, 2017 4:45 AM, "Sahina Bose" <[email protected]> wrote: >> >>> What's the output of "hosted-engine --vm-status" and "gluster volume >>> status engine" tell you? Are all the bricks running as per gluster vol >>> status? >>> >>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services? >>> >>> If HE still has issues powering up, please provide agent.log and >>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs >>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log >>> >>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <[email protected]> wrote: >>> >>>> Good morning oVirt community, >>>> >>>> I'm running a three host gluster environment with hosted engine. >>>> >>>> Yesterday the engine went down and has not been able to come up >>>> properly. It tries to start on all three host. >>>> >>>> I have two gluster volumes, data and engne. The data storage domian >>>> volume is no longer mounted but the engine volume is up. I've restarted the >>>> gluster service and make sure both volumes were running. The data volume >>>> will not mount. >>>> >>>> How can I get the engine running properly again? >>>> >>>> Thanks, >>>> >>>> Joel >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

