Ok. Simone,
Please let me know if I can provide any additional log files. Thanks for taking the time to look into this. Joel On Jun 16, 2017 8:59 AM, "Sahina Bose" <[email protected]> wrote: > I don't notice anything wrong on the gluster end. > > Maybe Simone can help take a look at HE behaviour? > > On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <[email protected]> wrote: > >> Good morning, >> >> Info requested below. >> >> [root@ovirt-hyp-02 ~]# hosted-engine --vm-start >> >> Exception in thread Client localhost:54321 (most likely raised during >> interpreter shutdown):VM exists and its status is Up >> >> >> >> [root@ovirt-hyp-02 ~]# ping engine >> >> PING engine.example.lan (192.168.170.149) 56(84) bytes of data. >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination >> Host Unreachable >> >> From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination >> Host Unreachable >> >> >> >> >> >> [root@ovirt-hyp-02 ~]# gluster volume status engine >> >> Status of volume: engine >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------ >> ------------------ >> >> Brick 192.168.170.141:/gluster_bricks/engin >> >> e/engine 49159 0 Y >> 1799 >> >> Brick 192.168.170.143:/gluster_bricks/engin >> >> e/engine 49159 0 Y >> 2900 >> >> Self-heal Daemon on localhost N/A N/A Y >> 2914 >> >> Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A >> Y 1854 >> >> >> >> Task Status of Volume engine >> >> ------------------------------------------------------------ >> ------------------ >> >> There are no active volume tasks >> >> >> >> [root@ovirt-hyp-02 ~]# gluster volume heal engine info >> >> Brick 192.168.170.141:/gluster_bricks/engine/engine >> >> Status: Connected >> >> Number of entries: 0 >> >> >> >> Brick 192.168.170.143:/gluster_bricks/engine/engine >> >> Status: Connected >> >> Number of entries: 0 >> >> >> >> Brick 192.168.170.147:/gluster_bricks/engine/engine >> >> Status: Connected >> >> Number of entries: 0 >> >> >> >> [root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c >> enter-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log >> >> [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] >> 0-glusterfs: No change in volfile, continuing >> >> >> >> >> >> Each of the three host sends out the following notifications about every >> 15 minutes. >> >> Hosted engine host: ovirt-hyp-01.example.lan changed state: >> EngineDown-EngineStart. >> >> Hosted engine host: ovirt-hyp-01.example.lan changed state: >> EngineStart-EngineStarting. >> >> Hosted engine host: ovirt-hyp-01.example.lan changed state: >> EngineStarting-EngineForceStop. >> >> Hosted engine host: ovirt-hyp-01.example.lan changed state: >> EngineForceStop-EngineDown. >> >> Please let me know if you need any additional information. >> >> Thank you, >> >> Joel >> >> >> >> On Jun 16, 2017 2:52 AM, "Sahina Bose" <[email protected]> wrote: >> >>> From the agent.log, >>> MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovi >>> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine >>> vm is running on host ovirt-hyp-02.reis.com (id 2) >>> >>> It looks like the HE VM was started successfully? Is it possible that >>> the ovirt-engine service could not be started on the HE VM. Could you try >>> to start the HE vm using below and then logging into the VM console. >>> #hosted-engine --vm-start >>> >>> Also, please check >>> # gluster volume status engine >>> # gluster volume heal engine info >>> >>> Please also check if there are errors in gluster mount logs - at >>> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log >>> >>> >>> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <[email protected]> wrote: >>> >>>> Sorry. I forgot to attached the requested logs in the previous email. >>>> >>>> Thanks, >>>> >>>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <[email protected]> wrote: >>>> >>>> Good morning, >>>> >>>> Requested info below. Along with some additional info. >>>> >>>> You'll notice the data volume is not mounted. >>>> >>>> Any help in getting HE back running would be greatly appreciated. >>>> >>>> Thank you, >>>> >>>> Joel >>>> >>>> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status >>>> >>>> >>>> >>>> >>>> >>>> --== Host 1 status ==-- >>>> >>>> >>>> >>>> conf_on_shared_storage : True >>>> >>>> Status up-to-date : False >>>> >>>> Hostname : ovirt-hyp-01.example.lan >>>> >>>> Host ID : 1 >>>> >>>> Engine status : unknown stale-data >>>> >>>> Score : 3400 >>>> >>>> stopped : False >>>> >>>> Local maintenance : False >>>> >>>> crc32 : 5558a7d3 >>>> >>>> local_conf_timestamp : 20356 >>>> >>>> Host timestamp : 20341 >>>> >>>> Extra metadata (valid at timestamp): >>>> >>>> metadata_parse_version=1 >>>> >>>> metadata_feature_version=1 >>>> >>>> timestamp=20341 (Fri Jun 9 14:38:57 2017) >>>> >>>> host-id=1 >>>> >>>> score=3400 >>>> >>>> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017) >>>> >>>> conf_on_shared_storage=True >>>> >>>> maintenance=False >>>> >>>> state=EngineDown >>>> >>>> stopped=False >>>> >>>> >>>> >>>> >>>> >>>> --== Host 2 status ==-- >>>> >>>> >>>> >>>> conf_on_shared_storage : True >>>> >>>> Status up-to-date : False >>>> >>>> Hostname : ovirt-hyp-02.example.lan >>>> >>>> Host ID : 2 >>>> >>>> Engine status : unknown stale-data >>>> >>>> Score : 3400 >>>> >>>> stopped : False >>>> >>>> Local maintenance : False >>>> >>>> crc32 : 936d4cf3 >>>> >>>> local_conf_timestamp : 20351 >>>> >>>> Host timestamp : 20337 >>>> >>>> Extra metadata (valid at timestamp): >>>> >>>> metadata_parse_version=1 >>>> >>>> metadata_feature_version=1 >>>> >>>> timestamp=20337 (Fri Jun 9 14:39:03 2017) >>>> >>>> host-id=2 >>>> >>>> score=3400 >>>> >>>> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017) >>>> >>>> conf_on_shared_storage=True >>>> >>>> maintenance=False >>>> >>>> state=EngineDown >>>> >>>> stopped=False >>>> >>>> >>>> >>>> >>>> >>>> --== Host 3 status ==-- >>>> >>>> >>>> >>>> conf_on_shared_storage : True >>>> >>>> Status up-to-date : False >>>> >>>> Hostname : ovirt-hyp-03.example.lan >>>> >>>> Host ID : 3 >>>> >>>> Engine status : unknown stale-data >>>> >>>> Score : 3400 >>>> >>>> stopped : False >>>> >>>> Local maintenance : False >>>> >>>> crc32 : f646334e >>>> >>>> local_conf_timestamp : 20391 >>>> >>>> Host timestamp : 20377 >>>> >>>> Extra metadata (valid at timestamp): >>>> >>>> metadata_parse_version=1 >>>> >>>> metadata_feature_version=1 >>>> >>>> timestamp=20377 (Fri Jun 9 14:39:37 2017) >>>> >>>> host-id=3 >>>> >>>> score=3400 >>>> >>>> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017) >>>> >>>> conf_on_shared_storage=True >>>> >>>> maintenance=False >>>> >>>> state=EngineStop >>>> >>>> stopped=False >>>> >>>> timeout=Thu Jan 1 00:43:08 1970 >>>> >>>> >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ~]# gluster peer status >>>> >>>> Number of Peers: 2 >>>> >>>> >>>> >>>> Hostname: 192.168.170.143 >>>> >>>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5 >>>> >>>> State: Peer in Cluster (Connected) >>>> >>>> Other names: >>>> >>>> 10.0.0.2 >>>> >>>> >>>> >>>> Hostname: 192.168.170.147 >>>> >>>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529 >>>> >>>> State: Peer in Cluster (Connected) >>>> >>>> Other names: >>>> >>>> 10.0.0.3 >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ~]# gluster volume info all >>>> >>>> >>>> >>>> Volume Name: data >>>> >>>> Type: Replicate >>>> >>>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02 >>>> >>>> Status: Started >>>> >>>> Snapshot Count: 0 >>>> >>>> Number of Bricks: 1 x (2 + 1) = 3 >>>> >>>> Transport-type: tcp >>>> >>>> Bricks: >>>> >>>> Brick1: 192.168.170.141:/gluster_bricks/data/data >>>> >>>> Brick2: 192.168.170.143:/gluster_bricks/data/data >>>> >>>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter) >>>> >>>> Options Reconfigured: >>>> >>>> nfs.disable: on >>>> >>>> performance.readdir-ahead: on >>>> >>>> transport.address-family: inet >>>> >>>> performance.quick-read: off >>>> >>>> performance.read-ahead: off >>>> >>>> performance.io-cache: off >>>> >>>> performance.stat-prefetch: off >>>> >>>> performance.low-prio-threads: 32 >>>> >>>> network.remote-dio: off >>>> >>>> cluster.eager-lock: enable >>>> >>>> cluster.quorum-type: auto >>>> >>>> cluster.server-quorum-type: server >>>> >>>> cluster.data-self-heal-algorithm: full >>>> >>>> cluster.locking-scheme: granular >>>> >>>> cluster.shd-max-threads: 8 >>>> >>>> cluster.shd-wait-qlength: 10000 >>>> >>>> features.shard: on >>>> >>>> user.cifs: off >>>> >>>> storage.owner-uid: 36 >>>> >>>> storage.owner-gid: 36 >>>> >>>> network.ping-timeout: 30 >>>> >>>> performance.strict-o-direct: on >>>> >>>> cluster.granular-entry-heal: enable >>>> >>>> >>>> >>>> Volume Name: engine >>>> >>>> Type: Replicate >>>> >>>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd >>>> >>>> Status: Started >>>> >>>> Snapshot Count: 0 >>>> >>>> Number of Bricks: 1 x (2 + 1) = 3 >>>> >>>> Transport-type: tcp >>>> >>>> Bricks: >>>> >>>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine >>>> >>>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine >>>> >>>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter) >>>> >>>> Options Reconfigured: >>>> >>>> nfs.disable: on >>>> >>>> performance.readdir-ahead: on >>>> >>>> transport.address-family: inet >>>> >>>> performance.quick-read: off >>>> >>>> performance.read-ahead: off >>>> >>>> performance.io-cache: off >>>> >>>> performance.stat-prefetch: off >>>> >>>> performance.low-prio-threads: 32 >>>> >>>> network.remote-dio: off >>>> >>>> cluster.eager-lock: enable >>>> >>>> cluster.quorum-type: auto >>>> >>>> cluster.server-quorum-type: server >>>> >>>> cluster.data-self-heal-algorithm: full >>>> >>>> cluster.locking-scheme: granular >>>> >>>> cluster.shd-max-threads: 8 >>>> >>>> cluster.shd-wait-qlength: 10000 >>>> >>>> features.shard: on >>>> >>>> user.cifs: off >>>> >>>> storage.owner-uid: 36 >>>> >>>> storage.owner-gid: 36 >>>> >>>> network.ping-timeout: 30 >>>> >>>> performance.strict-o-direct: on >>>> >>>> cluster.granular-entry-heal: enable >>>> >>>> >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ~]# df -h >>>> >>>> Filesystem Size Used Avail Use% >>>> Mounted on >>>> >>>> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% / >>>> >>>> devtmpfs 7.7G 0 7.7G 0% >>>> /dev >>>> >>>> tmpfs 7.8G 0 7.8G 0% >>>> /dev/shm >>>> >>>> tmpfs 7.8G 8.7M 7.7G 1% >>>> /run >>>> >>>> tmpfs 7.8G 0 7.8G 0% >>>> /sys/fs/cgroup >>>> >>>> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1% >>>> /home >>>> >>>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16% >>>> /gluster_bricks/engine >>>> >>>> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22% >>>> /gluster_bricks/data >>>> >>>> /dev/sda1 497M 173M 325M 35% >>>> /boot >>>> >>>> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16% >>>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine >>>> >>>> tmpfs 1.6G 0 1.6G 0% >>>> /run/user/0 >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt >>>> >>>> ovirt-ha-agent.service enabled >>>> >>>> ovirt-ha-broker.service enabled >>>> >>>> ovirt-imageio-daemon.service disabled >>>> >>>> ovirt-vmconsole-host-sshd.service enabled >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service >>>> >>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >>>> Monitoring Agent >>>> >>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >>>> enabled; vendor preset: disabled) >>>> >>>> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago >>>> >>>> Main PID: 3150 (ovirt-ha-agent) >>>> >>>> CGroup: /system.slice/ovirt-ha-agent.service >>>> >>>> └─3150 /usr/bin/python >>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >>>> --no-daemon >>>> >>>> >>>> >>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>>> Hosted Engine High Availability Monitoring Agent. >>>> >>>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>>> Hosted Engine High Availability Monitoring Agent... >>>> >>>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]: >>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>>> ERROR Engine VM stopped on localhost >>>> >>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >>>> >>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >>>> Communications Broker >>>> >>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >>>> enabled; vendor preset: disabled) >>>> >>>> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago >>>> >>>> Main PID: 968 (ovirt-ha-broker) >>>> >>>> CGroup: /system.slice/ovirt-ha-broker.service >>>> >>>> └─968 /usr/bin/python >>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >>>> --no-daemon >>>> >>>> >>>> >>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>>> Hosted Engine High Availability Communications Broker. >>>> >>>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>>> Hosted Engine High Availability Communications Broker... >>>> >>>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]: >>>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler >>>> ERROR Error handling request, data: '...1b55bcf76' >>>> >>>> Traceback >>>> (most recent call last): >>>> >>>> File >>>> "/usr/lib/python2.7/site-packages/ovirt... >>>> >>>> Hint: Some lines were ellipsized, use -l to show in full. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service >>>> >>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service >>>> >>>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >>>> Monitoring Agent >>>> >>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >>>> enabled; vendor preset: disabled) >>>> >>>> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago >>>> >>>> Main PID: 8563 (ovirt-ha-agent) >>>> >>>> CGroup: /system.slice/ovirt-ha-agent.service >>>> >>>> └─8563 /usr/bin/python >>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >>>> --no-daemon >>>> >>>> >>>> >>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>>> Hosted Engine High Availability Monitoring Agent. >>>> >>>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>>> Hosted Engine High Availability Monitoring Agent... >>>> >>>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service >>>> >>>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >>>> >>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >>>> Communications Broker >>>> >>>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >>>> enabled; vendor preset: disabled) >>>> >>>> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago >>>> >>>> Main PID: 8844 (ovirt-ha-broker) >>>> >>>> CGroup: /system.slice/ovirt-ha-broker.service >>>> >>>> └─8844 /usr/bin/python >>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >>>> --no-daemon >>>> >>>> >>>> >>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>>> Hosted Engine High Availability Communications Broker. >>>> >>>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>>> Hosted Engine High Availability Communications Broker... >>>> >>>> >>>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <[email protected]> wrote: >>>> >>>>> What's the output of "hosted-engine --vm-status" and "gluster volume >>>>> status engine" tell you? Are all the bricks running as per gluster vol >>>>> status? >>>>> >>>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services? >>>>> >>>>> If HE still has issues powering up, please provide agent.log and >>>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount >>>>> logs from /var/log/glusterfs/rhev-data-center-mnt <engine>.log >>>>> >>>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <[email protected]> >>>>> wrote: >>>>> >>>>>> Good morning oVirt community, >>>>>> >>>>>> I'm running a three host gluster environment with hosted engine. >>>>>> >>>>>> Yesterday the engine went down and has not been able to come up >>>>>> properly. It tries to start on all three host. >>>>>> >>>>>> I have two gluster volumes, data and engne. The data storage domian >>>>>> volume is no longer mounted but the engine volume is up. I've restarted >>>>>> the >>>>>> gluster service and make sure both volumes were running. The data volume >>>>>> will not mount. >>>>>> >>>>>> How can I get the engine running properly again? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Joel >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>>> >>>>> >>>> >>> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

