[ovirt-users] Re: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Charles Lam Mon, 11 Jan 2021 20:47:13 -0800

Hi Ritesh,

Yes, I have tried Gluster deployment several times.  I was able to resolve
the kvdo not installed issue, but no matter what I have tried to date
recently I cannot get Gluster to deploy.  I had a hyperconverged oVirt
cluster/Gluster with VDO successfully running on this hardware and switches
before.  What I have changed since then was switching the storage to direct
connect and now installing with oVirt v4.4.  I was last successful with
oVirt v4.2.


I tried Gluster deployment after cleaning within the Cockpit web console,
using the suggested ansible-playbook and fresh image with oVirt Node v4.4
ISO.  Ping from each host to the other two works for both mgmt and storage
networks.  I am using DHCP for management network, hosts file for direct
connect storage network.

Thanks again for your help,
Charles

On Mon, Jan 11, 2021 at 10:03 PM Ritesh Chikatwar <[email protected]>
wrote:

>
>
> On Tue, Jan 12, 2021, 2:04 AM Charles Lam <[email protected]> wrote:
>
>> Dear Strahil and Ritesh,
>>
>> Thank you both.  I am back where I started with:
>>
>> "One or more bricks could be down. Please execute the command again after
>> bringing all bricks online and finishing any pending heals\nVolume heal
>> failed.", "stdout_lines": ["One or more bricks could be down. Please
>> execute the command again after bringing all bricks online and finishing
>> any pending heals", "Volume heal failed."]
>>
>> Regarding my most recent issue:
>>
>> "vdo: ERROR - Kernel module kvdo not installed\nvdo: ERROR - modprobe:
>> FATAL: Module
>> kvdo not found in directory /lib/modules/4.18.0-240.1.1.el8_3.x86_64\n"
>>
>> Per Strahil's note, I checked for kvdo:
>>
>> [[email protected] conf.d]# rpm -qa | grep vdo
>> libblockdev-vdo-2.24-1.el8.x86_64
>> vdo-6.2.3.114-14.el8.x86_64
>> kmod-kvdo-6.2.2.117-65.el8.x86_64
>> [[email protected] conf.d]#
>>
>> [[email protected] conf.d]# rpm -qa | grep vdo
>> libblockdev-vdo-2.24-1.el8.x86_64
>> vdo-6.2.3.114-14.el8.x86_64
>> kmod-kvdo-6.2.2.117-65.el8.x86_64
>> [[email protected] conf.d]#
>>
>> [[email protected] ~]# rpm -qa | grep vdo
>> libblockdev-vdo-2.24-1.el8.x86_64
>> vdo-6.2.3.114-14.el8.x86_64
>> kmod-kvdo-6.2.2.117-65.el8.x86_64
>> [[email protected] ~]#
>>
>> I found
>> https://unix.stackexchange.com/questions/624011/problem-on-centos-8-with-creating-vdo-kernel-module-kvdo-not-installed
>> which pointed to https://bugs.centos.org/view.php?id=17928.  As
>> suggested on the CentOS bug tracker I attempted to manually install
>>
>> vdo-support-6.2.4.14-14.el8.x86_64
>> vdo-6.2.4.14-14.el8.x86_64
>> kmod-kvdo-6.2.3.91-73.el8.x86_64
>>
>> but there was a dependency that kernel-core be greater than what I was
>> installed, so I manually upgraded kernel-core to
>> kernel-core-4.18.0-259.el8.x86_64.rpm then upgraded vdo and kmod-kvdo to
>>
>> vdo-6.2.4.14-14.el8.x86_64.rpm
>> kmod-kvdo-6.2.4.26-76.el8.x86_64.rpm
>>
>> and installed vdo-support-6.2.4.14-14.el8.x86_64.rpm.  Upon clean-up and
>> redeploy I am now back at Gluster deploy failing at
>>
>> TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on]
>> **********
>> task path:
>> /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67
>> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick':
>> '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var":
>> "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine",
>> "granular-entry-heal", "enable"], "delta": "0:00:10.098573", "end":
>> "2021-01-11 19:27:05.333720", "item": {"arbiter": 0, "brick":
>> "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero
>> return code", "rc": 107, "start": "2021-01-11 19:26:55.235147", "stderr":
>> "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please
>> execute the command again after bringing all bricks online and finishing
>> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more
>> bricks could be down. Please execute the command again after bringing all
>> bricks online and finishing any pending heals", "Volume heal failed."]}
>> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick':
>> '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item",
>> "changed": true, "cmd": ["gluster", "volume", "heal", "data",
>> "granular-entry-heal", "enable"], "delta": "0:00:10.099670", "end":
>> "2021-01-11 19:27:20.564554", "item": {"arbiter": 0, "brick":
>> "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return
>> code", "rc": 107, "start": "2021-01-11 19:27:10.464884", "stderr": "",
>> "stderr_lines": [], "stdout": "One or more bricks could be down. Please
>> execute the command again after bringing all bricks online and finishing
>> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more
>> bricks could be down. Please execute the command again after bringing all
>> bricks online and finishing any pending heals", "Volume heal failed."]}
>> failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick':
>> '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var":
>> "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore",
>> "granular-entry-heal", "enable"], "delta": "0:00:10.104624", "end":
>> "2021-01-11 19:27:35.774230", "item": {"arbiter": 0, "brick":
>> "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero
>> return code", "rc": 107, "start": "2021-01-11 19:27:25.669606", "stderr":
>> "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please
>> execute the command again after bringing all bricks online and finishing
>> any pending heals\nVolume heal failed.", "stdout_lines": ["One or more
>> bricks could be down. Please execute the command again after bringing all
>> bricks online and finishing any pending heals", "Volume heal failed."]}
>>
>> NO MORE HOSTS LEFT
>> *************************************************************
>>
>> NO MORE HOSTS LEFT
>> *************************************************************
>>
>> PLAY RECAP
>> *********************************************************************
>> fmov1n1.sn.dtcorp.com      : ok=70   changed=29   unreachable=0
>> failed=1    skipped=188  rescued=0    ignored=1
>> fmov1n2.sn.dtcorp.com      : ok=68   changed=27   unreachable=0
>> failed=0    skipped=163  rescued=0    ignored=1
>> fmov1n3.sn.dtcorp.com      : ok=68   changed=27   unreachable=0
>> failed=0    skipped=163  rescued=0    ignored=1
>>
>> Please check /var/log/cockpit/ovirt-dashboard/gluster-deployment.log for
>> more informations.
>>
>> I doubled-back to Strahil's recommendation to restart Gluster and enable
>> granular-entry-heal.  This fails, example:
>>
>> [root@host1 ~]# gluster volume heal data granular-entry-heal enable
>> One or more bricks could be down. Please execute the command again after
>> bringing all bricks online and finishing any pending heals
>> Volume heal failed.
>>
>> I have followed Ritesh's suggestion:
>>
>> [root@host1~]# ansible-playbook
>> /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/tasks/gluster_cleanup.yml
>> -i /etc/ansible/hc_wizard_inventory.yml
>>
>> which appeared to execute successfully:
>>
>> PLAY RECAP
>> **********************************************************************************************************
>> fmov1n1.sn.dtcorp.com      : ok=11   changed=2    unreachable=0
>> failed=0    skipped=2    rescued=0    ignored=0
>> fmov1n2.sn.dtcorp.com      : ok=9    changed=1    unreachable=0
>> failed=0    skipped=1    rescued=0    ignored=0
>> fmov1n3.sn.dtcorp.com      : ok=9    changed=1    unreachable=0
>> failed=0    skipped=1    rescued=0    ignored=0
>>
>
> So after this have you tried gluster deployment..?
>
>>
>> Here is the info Strahil requested when I first reported this issue on
>> December 18th, re-run today, January 11:
>>
>> [root@host1 ~]# gluster pool list
>> UUID                                    Hostname                State
>> 4964020a-9632-43eb-9468-798920e98559    host2.domain.com   Connected
>> f0718e4f-1ac6-4b82-a8d7-a4d31cd0f38b    host3.domain.com   Connected
>> 6ba94e82-579c-4ae2-b3c5-bef339c6f795    localhost               Connected
>> [root@host1 ~]# gluster volume list
>> data
>> engine
>> vmstore
>> [root@host1 ~]# for i in $(gluster volume list); do gluster volume
>> status $i; gluster volume info $i; echo
>> "###########################################################################################################";done
>> Status of volume: data
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick host1.domain.com:/gluster_bricks
>> /data/data                                  49153     0          Y
>>  406272
>> Brick host2.domain.com:/gluster_bricks
>> /data/data                                  49153     0          Y
>>  360300
>> Brick host3.domain.com:/gluster_bricks
>> /data/data                                  49153     0          Y
>>  360082
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  413227
>> Self-heal Daemon on host2.domain.com   N/A       N/A        Y
>>  360223
>> Self-heal Daemon on host3.domain.com   N/A       N/A        Y
>>  360003
>>
>> Task Status of Volume data
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> Volume Name: data
>> Type: Replicate
>> Volume ID: ed65a922-bd85-4574-ba21-25b3755acbce
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: host1.domain.com:/gluster_bricks/data/data
>> Brick2: host2.domain.com:/gluster_bricks/data/data
>> Brick3: host3.domain.com:/gluster_bricks/data/data
>> Options Reconfigured:
>> performance.client-io-threads: on
>> nfs.disable: on
>> storage.fips-mode-rchecksum: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.low-prio-threads: 32
>> network.remote-dio: off
>> cluster.eager-lock: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: on
>> user.cifs: off
>> cluster.choose-local: off
>> client.event-threads: 4
>> server.event-threads: 4
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> network.ping-timeout: 30
>> performance.strict-o-direct: on
>>
>> ###########################################################################################################
>> Status of volume: engine
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick host1.domain.com:/gluster_bricks
>> /engine/engine                              49152     0          Y
>>  404563
>> Brick host2.domain.com:/gluster_bricks
>> /engine/engine                              49152     0          Y
>>  360202
>> Brick host3.domain.com:/gluster_bricks
>> /engine/engine                              49152     0          Y
>>  359982
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  413227
>> Self-heal Daemon on host3.domain.com   N/A       N/A        Y
>>  360003
>> Self-heal Daemon on host2.domain.com   N/A       N/A        Y
>>  360223
>>
>> Task Status of Volume engine
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> Volume Name: engine
>> Type: Replicate
>> Volume ID: 45d4ec84-38a1-41ff-b8ec-8b00eb658908
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: host1.domain.com:/gluster_bricks/engine/engine
>> Brick2: host2.domain.com:/gluster_bricks/engine/engine
>> Brick3: host3.domain.com:/gluster_bricks/engine/engine
>> Options Reconfigured:
>> performance.client-io-threads: on
>> nfs.disable: on
>> storage.fips-mode-rchecksum: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.low-prio-threads: 32
>> network.remote-dio: off
>> cluster.eager-lock: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: on
>> user.cifs: off
>> cluster.choose-local: off
>> client.event-threads: 4
>> server.event-threads: 4
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> network.ping-timeout: 30
>> performance.strict-o-direct: on
>>
>> ###########################################################################################################
>> Status of volume: vmstore
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick host1.domain.com:/gluster_bricks
>> /vmstore/vmstore                            49154     0          Y
>>  407952
>> Brick host2.domain.com:/gluster_bricks
>> /vmstore/vmstore                            49154     0          Y
>>  360389
>> Brick host3.domain.com:/gluster_bricks
>> /vmstore/vmstore                            49154     0          Y
>>  360176
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  413227
>> Self-heal Daemon on host2.domain.com   N/A       N/A        Y
>>  360223
>> Self-heal Daemon on host3.domain.com   N/A       N/A        Y
>>  360003
>>
>> Task Status of Volume vmstore
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>> Volume Name: vmstore
>> Type: Replicate
>> Volume ID: 27c8346c-0374-4108-a33a-0024007a9527
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: host1.domain.com:/gluster_bricks/vmstore/vmstore
>> Brick2: host2.domain.com:/gluster_bricks/vmstore/vmstore
>> Brick3: host3.domain.com:/gluster_bricks/vmstore/vmstore
>> Options Reconfigured:
>> performance.client-io-threads: on
>> nfs.disable: on
>> storage.fips-mode-rchecksum: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.low-prio-threads: 32
>> network.remote-dio: off
>> cluster.eager-lock: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: on
>> user.cifs: off
>> cluster.choose-local: off
>> client.event-threads: 4
>> server.event-threads: 4
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> network.ping-timeout: 30
>> performance.strict-o-direct: on
>>
>> ###########################################################################################################
>> [root@host1 ~]#
>>
>> Again, further suggestions for troubleshooting are VERY much appreciated!
>>
>> Respectfully,
>> Charles
>> _______________________________________________
>> Users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/[email protected]/message/A2NR63KWDQSXFS2CRWGRF4HNIR4YDX6K/
>>
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/5XMBLK57C5B67QQTNONWO7RVCMPKJCFZ/

[ovirt-users] Re: New failure Gluster deploy: Set granual-entry-heal on --> Bricks down

Reply via email to