Hi Ravi, I'm not sure what was bothering the SHD, but i tried again to restart the glusterd/glusterfsd's and now the SHD came up. I'm sorry i don't have any more details about what could have caused the issue. If you want me to check/search for certain log messages on what could have caused it, please let me know.
Thanks for your assistance. Best Olaf Op do 26 nov. 2020 om 11:53 schreef Ravishankar N <[email protected]>: > > On 26/11/20 4:00 pm, Olaf Buitelaar wrote: > > Hi Ravi, > > I could try that, but i can only try a setup on VM's, and will not be able > to setup an environment like our production environment. > Which runs on physical machines, and has actual production load etc. So > the 2 setups would be quite different. > Personally i think it would be best debug the actual machines instead of > trying to reproduce it. Since the reproduction of the issue on the > physical machines is just swap the repositories and upgrade the packages. > Let me know what you think? > > Physical machines or VMs - anything is fine. The only thing is I cannot > guarantee quick responses , so if it is a production machine, it will be an > issue for you. So any set up you can use for experimenting is fine. You > don't need any clients for the testing. Just create a 1x2 replica volume > using 2 nodes and start it. Then upgrade one node and see if shd and bricks > come up on that node. > > -Ravi > > > Thanks Olaf > > Op do 26 nov. 2020 om 02:43 schreef Ravishankar N <[email protected] > >: > >> >> On 25/11/20 7:17 pm, Olaf Buitelaar wrote: >> >> Hi Ravi, >> >> Thanks for checking. Unfortunately this is our production system, what >> i've done is simple change the yum repo from gluter-6 to >> http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/. >> Did a yum upgrade. And did restart the glusterd process several times, i've >> also tried rebooting the machine. And didn't touch the op-version yet, >> which is still at (60000), usually i only do this when all nodes are >> upgraded, and are running stable. >> We're running multiple volumes with different configurations, but for >> none of the volumes the shd starts on the upgraded nodes. >> Is there anything further i could check/do to get to the bottom of this? >> >> Hi Olaf, like I said, would it be possible to create a test setup to see >> if you can recreate it? >> Regards, >> Ravi >> >> >> Thanks Olaf >> >> Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N <[email protected] >> >: >> >>> >>> On 25/11/20 5:50 pm, Olaf Buitelaar wrote: >>> >>> Hi Ashish, >>> >>> Thank you for looking into this. I indeed also suspect it has something >>> todo with the 7.X client, because on the 6.X clients the issue doesn't >>> really seem to occur. >>> I would love to update everything to 7.X, But since the self-heal >>> daemons ( >>> https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html) >>> won't start, i halted the full upgrade. >>> >>> Olaf, based on your email. I did try to upgrade a 1 node of a 3-node >>> replica 3 setup from 6.10 to 7.8 on my test VMs and I found that the >>> self-heal daemon (and the bricks) came online after I restarted glusterd >>> post-upgrade on that node. (I did not touch the op-version), and I did not >>> spend time on it further. So I don't think the problem is related to the >>> shd mux changes I referred to. But if you have a test setup where you can >>> reproduce this, please raise a github issue with the details. >>> Thanks, >>> Ravi >>> >>> Hopefully that issue will be addressed in the upcoming release. Once >>> i've everything running on the same version i'll check if the issue still >>> occurs and reach out, if that's the case. >>> >>> Thanks Olaf >>> >>> Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey <[email protected]>: >>> >>>> >>>> Hi, >>>> >>>> I checked the statedump and found some very high memory allocations. >>>> grep -rwn "num_allocs" glusterdump.17317.dump.1605* | cut -d'=' -f2 | >>>> sort >>>> >>>> 30003616 >>>> 30003616 >>>> 3305 >>>> 3305 >>>> 36960008 >>>> 36960008 >>>> 38029944 >>>> 38029944 >>>> 38450472 >>>> 38450472 >>>> 39566824 >>>> 39566824 >>>> 4 >>>> I did check the lines on statedump and it could be happening in >>>> protocol/clinet. However, I did not find anything suspicious in my quick >>>> code exploration. >>>> I would suggest to upgrade all the nodes on latest version and the >>>> start your work and see if there is any high usage of memory . >>>> That way it will also be easier to debug this issue. >>>> >>>> --- >>>> Ashish >>>> >>>> ------------------------------ >>>> *From: *"Olaf Buitelaar" <[email protected]> >>>> *To: *"gluster-users" <[email protected]> >>>> *Sent: *Thursday, November 19, 2020 10:28:57 PM >>>> *Subject: *[Gluster-users] possible memory leak in client/fuse mount >>>> >>>> Dear Gluster Users, >>>> >>>> I've a glusterfs process which consumes about all memory of the machine >>>> (~58GB); >>>> >>>> # ps -faxu|grep 17317 >>>> root 17317 3.1 88.9 59695516 58479708 ? Ssl Oct31 839:36 >>>> /usr/sbin/glusterfs --process-name fuse --volfile-server=10.201.0.1 >>>> --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9 >>>> --volfile-id=/docker2 /mnt/docker2 >>>> >>>> The gluster version on this machine is 7.8, but i'm currently running a >>>> mixed cluster of 6.10 and 7.8, while awaiting to proceed to upgrade for the >>>> issue mentioned earlier with the self-heal daemon. >>>> >>>> The affected volume info looks like; >>>> >>>> # gluster v info docker2 >>>> >>>> Volume Name: docker2 >>>> Type: Distributed-Replicate >>>> Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 3 x (2 + 1) = 9 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2 >>>> Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2 >>>> Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter) >>>> Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2 >>>> Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2 >>>> Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter) >>>> Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2 >>>> Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2 >>>> Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter) >>>> Options Reconfigured: >>>> performance.cache-size: 128MB >>>> transport.address-family: inet >>>> nfs.disable: on >>>> cluster.brick-multiplex: on >>>> >>>> The issue seems to be triggered by a program called zammad, which has >>>> an init process, which runs in a loop. on cycle it re-compiles the >>>> ruby-on-rails application. >>>> >>>> I've attached 2 statedumps, but as i only recently noticed the high >>>> memory usage, i believe both statedumps already show an escalated state of >>>> the glusterfs process. If it's needed to also have them from the beginning >>>> let me know. The dumps are taken about an hour apart. >>>> Also i've included the glusterd.log. I couldn't include mnt-docker2.log >>>> since it's too large, since it's littered with: " I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht" >>>> However i've inspected the log and it contains no Error message's all >>>> are of the Info kind; >>>> which look like these; >>>> [2020-11-19 03:29:05.406766] I >>>> [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs: No change in >>>> volfile,continuing >>>> [2020-11-19 03:29:21.271886] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-8: intentional socket shutdown(5) >>>> [2020-11-19 03:29:24.479738] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-2: intentional socket shutdown(5) >>>> [2020-11-19 03:30:12.318146] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-5: intentional socket shutdown(5) >>>> [2020-11-19 03:31:27.381720] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-8: intentional socket shutdown(5) >>>> [2020-11-19 03:31:30.579630] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-2: intentional socket shutdown(5) >>>> [2020-11-19 03:32:18.427364] I [socket.c:865:__socket_shutdown] >>>> 0-docker2-client-5: intentional socket shutdown(5) >>>> >>>> The rename messages look like these; >>>> [2020-11-19 03:29:05.402663] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D >>>> (fe083b7e-b0d5-485c-8666-e1f7cdac33e2) >>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5 >>>> ((null)) (hash=docker2-replicate-2/cache=<nul>) >>>> [2020-11-19 03:29:05.410972] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu >>>> (b1edadad-1d48-4bf4-be85-ffbe9d69d338) >>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff >>>> ((null)) (hash=docker2-replicate-2/cache=<nul>) >>>> [2020-11-19 03:29:05.420064] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul >>>> (31f80fcb-977c-433b-9259-5fdfcad1171c) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3 >>>> ((null)) (hash=docker2-replicate-0/cache=<nul>) >>>> [2020-11-19 03:29:05.427537] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec >>>> (e2fdf971-731f-4765-80e8-3165433488ea) >>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009 >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.440576] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22 >>>> (3e0bc6d1-13ac-47c6-b221-1256b4b506ef) >>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36 >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.452407] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT >>>> (9685b5f3-4b14-4050-9b00-1163856239b5) >>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e >>>> ((null)) (hash=docker2-replicate-0/cache=<nul>) >>>> [2020-11-19 03:29:05.460720] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK >>>> (d0a8d0a4-c783-45db-bb4a-68e24044d830) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025 >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.468800] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB >>>> (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb >>>> ((null)) (hash=docker2-replicate-0/cache=<nul>) >>>> [2020-11-19 03:29:05.476745] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs >>>> (17181a40-f9b2-438f-9dfc-7bb159c516e6) >>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7 >>>> ((null)) (hash=docker2-replicate-0/cache=<nul>) >>>> [2020-11-19 03:29:05.486729] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj >>>> (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.495115] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa >>>> (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.503424] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1 >>>> (ffc57a77-8b91-4264-8e2d-a9966f0f37ef) >>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0 >>>> ((null)) (hash=docker2-replicate-2/cache=<nul>) >>>> [2020-11-19 03:29:05.513532] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS >>>> (5a595a65-372d-4377-b547-2c4e23f7be3a) >>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad >>>> ((null)) (hash=docker2-replicate-0/cache=<nul>) >>>> [2020-11-19 03:29:05.526885] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J >>>> (2fa99fcd-64f8-4934-aeda-b356816f1132) >>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe >>>> ((null)) (hash=docker2-replicate-2/cache=<nul>) >>>> [2020-11-19 03:29:05.537637] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB >>>> (db24d7bf-4a06-4356-a52e-1ab9537d1c3a) >>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> [2020-11-19 03:29:05.547878] I [MSGID: 109066] >>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss >>>> (b12f041b-5bbd-4e3d-b700-8f673830393f) >>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) => >>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5 >>>> ((null)) (hash=docker2-replicate-1/cache=<nul>) >>>> >>>> if i can provide any more information please let me know. >>>> >>>> Thanks Olaf >>>> >>>> >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://meet.google.com/cpu-eiue-hvk >>>> Gluster-users mailing list >>>> [email protected] >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://meet.google.com/cpu-eiue-hvk >>> Gluster-users mailing >>> [email protected]https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
