Hi Jiri, your probleem looks pretty similar to mine, see; https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html Any chance you also see the xfs errors in de brick logs? For me the situation improved once i disabled brick multiplexing, but i don't see that in your volume configuration.
Cheers Olaf Op do 8 jul. 2021 om 12:28 schreef Jiří Sléžka <[email protected]>: > Hello gluster community, > > I am new to this list but using glusterfs for log time as our SDS > solution for storing 80+TiB of data. I'm also using glusterfs for small > 3 node HCI cluster with oVirt 4.4.6 and CentOS 8 (not stream yet). > Glusterfs version here is 8.5-2.el8.x86_64. > > For time to time (I belive) random brick on random host goes down > because health-check. It looks like > > [root@ovirt-hci02 ~]# grep "posix_health_check" > /var/log/glusterfs/bricks/* > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 07:13:37.408184] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 07:13:37.408407] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 16:11:14.518971] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 > 16:11:14.519200] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > > on other host > > [root@ovirt-hci01 ~]# grep "posix_health_check" > /var/log/glusterfs/bricks/* > /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 > 13:15:51.983327] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 > 13:15:51.983728] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: > still alive! -> SIGTERM > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 > 01:53:35.769129] M [MSGID: 113075] > [posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: > health-check failed, going down > /var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 > 01:53:35.769819] M [MSGID: 113075] > [posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still > alive! -> SIGTERM > > I cannot link these errors to any storage/fs issue (in dmesg or > /var/log/messages), brick devices looks healthy (smartd). > > I can force start brick with > > gluster volume start vms|engine force > > and after some healing all works fine for few days > > Did anybody observe this behavior? > > vms volume has this structure (two bricks per host, each is separate > JBOD ssd disk), engine volume has one brick on each host... > > gluster volume info vms > > Volume Name: vms > Type: Distributed-Replicate > Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: 10.0.4.11:/gluster_bricks/vms/vms > Brick2: 10.0.4.13:/gluster_bricks/vms/vms > Brick3: 10.0.4.12:/gluster_bricks/vms/vms > Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2 > Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2 > Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2 > Options Reconfigured: > cluster.granular-entry-heal: enable > performance.stat-prefetch: off > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > storage.fips-mode-rchecksum: on > nfs.disable: on > performance.client-io-threads: off > > > Cheers, > > Jiri > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
