Hi Ritesh, Yes, I have tried Gluster deployment several times. I was able to resolve the kvdo not installed issue, but no matter what I have tried to date recently I cannot get Gluster to deploy. I had a hyperconverged oVirt cluster/Gluster with VDO successfully running on this hardware and switches before. What I have changed since then was switching the storage to direct connect and now installing with oVirt v4.4. I was last successful with oVirt v4.2.
I tried Gluster deployment after cleaning within the Cockpit web console, using the suggested ansible-playbook and fresh image with oVirt Node v4.4 ISO. Ping from each host to the other two works for both mgmt and storage networks. I am using DHCP for management network, hosts file for direct connect storage network. Thanks again for your help, Charles On Mon, Jan 11, 2021 at 10:03 PM Ritesh Chikatwar <[email protected]> wrote: > > > On Tue, Jan 12, 2021, 2:04 AM Charles Lam <[email protected]> wrote: > >> Dear Strahil and Ritesh, >> >> Thank you both. I am back where I started with: >> >> "One or more bricks could be down. Please execute the command again after >> bringing all bricks online and finishing any pending heals\nVolume heal >> failed.", "stdout_lines": ["One or more bricks could be down. Please >> execute the command again after bringing all bricks online and finishing >> any pending heals", "Volume heal failed."] >> >> Regarding my most recent issue: >> >> "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe: >> FATAL: Module >> kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n" >> >> Per Strahil's note, I checked for kvdo: >> >> [[email protected] conf.d]# rpm -qa | grep vdo >> libblockdev-vdo-2.24-1.el8.x86_64 >> vdo-6.2.3.114-14.el8.x86_64 >> kmod-kvdo-6.2.2.117-65.el8.x86_64 >> [[email protected] conf.d]# >> >> [[email protected] conf.d]# rpm -qa | grep vdo >> libblockdev-vdo-2.24-1.el8.x86_64 >> vdo-6.2.3.114-14.el8.x86_64 >> kmod-kvdo-6.2.2.117-65.el8.x86_64 >> [[email protected] conf.d]# >> >> [[email protected] ~]# rpm -qa | grep vdo >> libblockdev-vdo-2.24-1.el8.x86_64 >> vdo-6.2.3.114-14.el8.x86_64 >> kmod-kvdo-6.2.2.117-65.el8.x86_64 >> [[email protected] ~]# >> >> I found >> https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-creating-vdo-kernel-module-kvdo-not-installed >> which pointed to https://bugs.centos.org/view.php?id=17928. As >> suggested on the CentOS bug tracker I attempted to manually install >> >> vdo-support-6.2.4.14-14.el8.x86_64 >> vdo-6.2.4.14-14.el8.x86_64 >> kmod-kvdo-6.2.3.91-73.el8.x86_64 >> >> but there was a dependency that kernel-core be greater than what I was >> installed, so I manually upgraded kernel-core to >> kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to >> >> vdo-6.2.4.14-14.el8.x86_64.rpm >> kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm >> >> and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm. Upon clean-up and >> redeploy I am now back at Gluster deploy failing at >> >> TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] >> ********** >> task path: >> /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 >> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': >> '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": >> "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", >> "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end": >> "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick": >> "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero >> return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr": >> "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please >> execute the command again after bringing all bricks online and finishing >> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more >> bricks could be down. Please execute the command again after bringing all >> bricks online and finishing any pending heals", "Volume heal failed."]} >> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': >> '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", >> "changed": true, "cmd": ["gluster", "volume", "heal", "data", >> "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end": >> "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick": >> "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return >> code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "", >> "stderr_lines": [], "stdout": "One or more bricks could be down. Please >> execute the command again after bringing all bricks online and finishing >> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more >> bricks could be down. Please execute the command again after bringing all >> bricks online and finishing any pending heals", "Volume heal failed."]} >> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': >> '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": >> "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", >> "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end": >> "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick": >> "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero >> return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr": >> "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please >> execute the command again after bringing all bricks online and finishing >> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more >> bricks could be down. Please execute the command again after bringing all >> bricks online and finishing any pending heals", "Volume heal failed."]} >> >> NO MORE HOSTS LEFT >> ************************************************************* >> >> NO MORE HOSTS LEFT >> ************************************************************* >> >> PLAY RECAP >> ********************************************************************* >> fmov1n1.sn.dtcorp.com : ok=70 changed=29 unreachable=0 >> failed=1 skipped=188 rescued=0 ignored=1 >> fmov1n2.sn.dtcorp.com : ok=68 changed=27 unreachable=0 >> failed=0 skipped=163 rescued=0 ignored=1 >> fmov1n3.sn.dtcorp.com : ok=68 changed=27 unreachable=0 >> failed=0 skipped=163 rescued=0 ignored=1 >> >> Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for >> more informations. >> >> I doubled-back to Strahil's recommendation to restart Gluster and enable >> granular-entry-heal. This fails, example: >> >> [root@host1 ~]# gluster volume heal data granular-entry-heal enable >> One or more bricks could be down. Please execute the command again after >> bringing all bricks online and finishing any pending heals >> Volume heal failed. >> >> I have followed Ritesh's suggestion: >> >> [root@host1~]# ansible-playbook >> /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml >> -i /etc/ansible/hc_wizard_inventory.yml >> >> which appeared to execute successfully: >> >> PLAY RECAP >> ********************************************************************************************************** >> fmov1n1.sn.dtcorp.com : ok=11 changed=2 unreachable=0 >> failed=0 skipped=2 rescued=0 ignored=0 >> fmov1n2.sn.dtcorp.com : ok=9 changed=1 unreachable=0 >> failed=0 skipped=1 rescued=0 ignored=0 >> fmov1n3.sn.dtcorp.com : ok=9 changed=1 unreachable=0 >> failed=0 skipped=1 rescued=0 ignored=0 >> > > So after this have you tried gluster deployment..? > >> >> Here is the info Strahil requested when I first reported this issue on >> December 18th, re-run today, January 11: >> >> [root@host1 ~]# gluster pool list >> UUID Hostname State >> 4964020a-9632-43eb-9468-798920e98559 host2.domain.com Connected >> f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b host3.domain.com Connected >> 6ba94e82-579c-4ae2-b3c5-bef339c6f795 localhost Connected >> [root@host1 ~]# gluster volume list >> data >> engine >> vmstore >> [root@host1 ~]# for i in $(gluster volume list); do gluster volume >> status $i; gluster volume info $i; echo >> "###########################################################################################################";done >> Status of volume: data >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick host1.domain.com:/gluster_bricks >> /data/data 49153 0 Y >> 406272 >> Brick host2.domain.com:/gluster_bricks >> /data/data 49153 0 Y >> 360300 >> Brick host3.domain.com:/gluster_bricks >> /data/data 49153 0 Y >> 360082 >> Self-heal Daemon on localhost N/A N/A Y >> 413227 >> Self-heal Daemon on host2.domain.com N/A N/A Y >> 360223 >> Self-heal Daemon on host3.domain.com N/A N/A Y >> 360003 >> >> Task Status of Volume data >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> Volume Name: data >> Type: Replicate >> Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: host1.domain.com:/gluster_bricks/data/data >> Brick2: host2.domain.com:/gluster_bricks/data/data >> Brick3: host3.domain.com:/gluster_bricks/data/data >> Options Reconfigured: >> performance.client-io-threads: on >> nfs.disable: on >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.low-prio-threads: 32 >> network.remote-dio: off >> cluster.eager-lock: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> features.shard: on >> user.cifs: off >> cluster.choose-local: off >> client.event-threads: 4 >> server.event-threads: 4 >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> network.ping-timeout: 30 >> performance.strict-o-direct: on >> >> ########################################################################################################### >> Status of volume: engine >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick host1.domain.com:/gluster_bricks >> /engine/engine 49152 0 Y >> 404563 >> Brick host2.domain.com:/gluster_bricks >> /engine/engine 49152 0 Y >> 360202 >> Brick host3.domain.com:/gluster_bricks >> /engine/engine 49152 0 Y >> 359982 >> Self-heal Daemon on localhost N/A N/A Y >> 413227 >> Self-heal Daemon on host3.domain.com N/A N/A Y >> 360003 >> Self-heal Daemon on host2.domain.com N/A N/A Y >> 360223 >> >> Task Status of Volume engine >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> Volume Name: engine >> Type: Replicate >> Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: host1.domain.com:/gluster_bricks/engine/engine >> Brick2: host2.domain.com:/gluster_bricks/engine/engine >> Brick3: host3.domain.com:/gluster_bricks/engine/engine >> Options Reconfigured: >> performance.client-io-threads: on >> nfs.disable: on >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.low-prio-threads: 32 >> network.remote-dio: off >> cluster.eager-lock: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> features.shard: on >> user.cifs: off >> cluster.choose-local: off >> client.event-threads: 4 >> server.event-threads: 4 >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> network.ping-timeout: 30 >> performance.strict-o-direct: on >> >> ########################################################################################################### >> Status of volume: vmstore >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick host1.domain.com:/gluster_bricks >> /vmstore/vmstore 49154 0 Y >> 407952 >> Brick host2.domain.com:/gluster_bricks >> /vmstore/vmstore 49154 0 Y >> 360389 >> Brick host3.domain.com:/gluster_bricks >> /vmstore/vmstore 49154 0 Y >> 360176 >> Self-heal Daemon on localhost N/A N/A Y >> 413227 >> Self-heal Daemon on host2.domain.com N/A N/A Y >> 360223 >> Self-heal Daemon on host3.domain.com N/A N/A Y >> 360003 >> >> Task Status of Volume vmstore >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> Volume Name: vmstore >> Type: Replicate >> Volume ID: 27c8346c-0374-4108-a33a-0024007a9527 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore >> Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore >> Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore >> Options Reconfigured: >> performance.client-io-threads: on >> nfs.disable: on >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.low-prio-threads: 32 >> network.remote-dio: off >> cluster.eager-lock: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> features.shard: on >> user.cifs: off >> cluster.choose-local: off >> client.event-threads: 4 >> server.event-threads: 4 >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> network.ping-timeout: 30 >> performance.strict-o-direct: on >> >> ########################################################################################################### >> [root@host1 ~]# >> >> Again, further suggestions for troubleshooting are VERY much appreciated! >> >> Respectfully, >> Charles >> _______________________________________________ >> Users mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/[email protected]/message/A2NR63KWDQSXFS2CRWGRF4HNIR4YDX6K/ >> >
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/5XMBLK57C5B67QQTNONWO7RVCMPKJCFZ/

