Sorry for the delay. Somehow Gmail decided to put almost all email from this list to spam. Anyway, yes, I checked the processes. Gluster processes are in 'R' state, the others in 'S' state. You can find 'top -H' output in the first message. We're running glusterfs 6.8 on CentOS 7.8. Linux kernel 4.19.
Thanks. вт, 23 июн. 2020 г. в 21:49, Strahil Nikolov <[email protected]>: > What is the OS and it's version ? > I have seen similar behaviour (different workload) on RHEL 7.6 (and > below). > > Have you checked what processes are in 'R' or 'D' state on st2a ? > > Best Regards, > Strahil Nikolov > > На 23 юни 2020 г. 19:31:12 GMT+03:00, Pavel Znamensky < > [email protected]> написа: > >Hi all, > >There's something strange with one of our clusters and glusterfs > >version > >6.8: it's quite slow and one node is overloaded. > >This is distributed cluster with four servers with the same > >specs/OS/versions: > > > >Volume Name: st2 > >Type: Distributed-Replicate > >Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45 > >Status: Started > >Snapshot Count: 0 > >Number of Bricks: 2 x 2 = 4 > >Transport-type: tcp > >Bricks: > >Brick1: st2a:/vol3/st2 > >Brick2: st2b:/vol3/st2 > >Brick3: st2c:/vol3/st2 > >Brick4: st2d:/vol3/st2 > >Options Reconfigured: > >cluster.rebal-throttle: aggressive > >nfs.disable: on > >performance.readdir-ahead: off > >transport.address-family: inet6 > >performance.quick-read: off > >performance.cache-size: 1GB > >performance.io-cache: on > >performance.io-thread-count: 16 > >cluster.data-self-heal-algorithm: full > >network.ping-timeout: 20 > >server.event-threads: 2 > >client.event-threads: 2 > >cluster.readdir-optimize: on > >performance.read-ahead: off > >performance.parallel-readdir: on > >cluster.self-heal-daemon: enable > >storage.health-check-timeout: 20 > > > >op.version for this cluster remains 50400 > > > >st2a is a replica for the st2b and st2c is a replica for st2d. > >All our 50 clients mount this volume using FUSE and in contrast with > >other > >our cluster this one works terrible slow. > >Interesting thing here is that there are very low HDDs and network > >utilization from one hand and quite overloaded server from another > >hand. > >Also, there are no files which should be healed according to `gluster > >volume heal st2 info`. > >Load average across servers: > >st2a: > >load average: 28,73, 26,39, 27,44 > >st2b: > >load average: 0,24, 0,46, 0,76 > >st2c: > >load average: 0,13, 0,20, 0,27 > >st2d: > >load average:2,93, 2,11, 1,50 > > > >If we stop glusterfs on st2a server the cluster will work as fast as we > >expected. > >Previously the cluster worked on a version 5.x and there were no such > >problems. > > > >Interestingly, that almost all CPU usage on st2a generates by a > >"system" > >load. > >The most CPU intensive process is glusterfsd. > >`top -H` for glusterfsd process shows this: > > > >PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > >COMMAND > > > >13894 root 20 0 2172892 96488 9056 R 74,0 0,1 122:09.14 > >glfs_iotwr00a > >13888 root 20 0 2172892 96488 9056 R 73,7 0,1 121:38.26 > >glfs_iotwr004 > >13891 root 20 0 2172892 96488 9056 R 73,7 0,1 121:53.83 > >glfs_iotwr007 > >13920 root 20 0 2172892 96488 9056 R 73,0 0,1 122:11.27 > >glfs_iotwr00f > >13897 root 20 0 2172892 96488 9056 R 68,3 0,1 121:09.82 > >glfs_iotwr00d > >13896 root 20 0 2172892 96488 9056 R 68,0 0,1 122:03.99 > >glfs_iotwr00c > >13868 root 20 0 2172892 96488 9056 R 67,7 0,1 122:42.55 > >glfs_iotwr000 > >13889 root 20 0 2172892 96488 9056 R 67,3 0,1 122:17.02 > >glfs_iotwr005 > >13887 root 20 0 2172892 96488 9056 R 67,0 0,1 122:29.88 > >glfs_iotwr003 > >13885 root 20 0 2172892 96488 9056 R 65,0 0,1 122:04.85 > >glfs_iotwr001 > >13892 root 20 0 2172892 96488 9056 R 55,0 0,1 121:15.23 > >glfs_iotwr008 > >13890 root 20 0 2172892 96488 9056 R 54,7 0,1 121:27.88 > >glfs_iotwr006 > >13895 root 20 0 2172892 96488 9056 R 54,0 0,1 121:28.35 > >glfs_iotwr00b > >13893 root 20 0 2172892 96488 9056 R 53,0 0,1 122:23.12 > >glfs_iotwr009 > >13898 root 20 0 2172892 96488 9056 R 52,0 0,1 122:30.67 > >glfs_iotwr00e > >13886 root 20 0 2172892 96488 9056 R 41,3 0,1 121:26.97 > >glfs_iotwr002 > >13878 root 20 0 2172892 96488 9056 S 1,0 0,1 1:20.34 > >glfs_rpcrqhnd > >13840 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.54 > >glfs_epoll000 > >13841 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.14 > >glfs_epoll001 > >13877 root 20 0 2172892 96488 9056 S 0,3 0,1 1:20.02 > >glfs_rpcrqhnd > >13833 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00 > >glusterfsd > >13834 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.14 > >glfs_timer > >13835 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00 > >glfs_sigwait > >13836 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.16 > >glfs_memsweep > >13837 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.05 > >glfs_sproc0 > > > >Also I didn't find relevant messages in log files. > >Honestly, don't know what to do. Does someone know how to debug or fix > >this > >behaviour? > > > >Best regards, > >Pavel >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
