On 25/11/20 7:17 pm, Olaf Buitelaar wrote:
Hi Ravi,

Thanks for checking. Unfortunately this is our production system, what i've done is simple change the yum repo from gluter-6 to http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/ <http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/>. Did a yum upgrade. And did restart the glusterd process several times, i've also tried rebooting the machine. And didn't touch the op-version yet, which is still at (60000), usually i only do this when all nodes are upgraded, and are running stable. We're running multiple volumes with different configurations, but for none of the volumes the shd starts on the upgraded nodes.
Is there anything further i could check/do to get to the bottom of this?

Hi Olaf, like I said, would it be possible to create a test setup to see if you can recreate it?

Regards,
Ravi

Thanks Olaf

Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N <[email protected] <mailto:[email protected]>>:


    On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
    Hi Ashish,

    Thank you for looking into this. I indeed also suspect it has
    something todo with the 7.X client, because on the 6.X clients
    the issue doesn't really seem to occur.
    I would love to update everything to 7.X, But since the self-heal
    daemons
    (https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html
    
<https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html>)
    won't start, i halted the full upgrade.

    Olaf, based on your email. I did try to upgrade a 1 node of a
    3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I found
    that the self-heal daemon (and the bricks) came online after I
    restarted glusterd post-upgrade on that node. (I did not touch the
    op-version), and I did not spend time on it further.  So I don't
    think the problem is related to the shd mux changes I referred to.
    But if you have a test setup where you can reproduce this, please
    raise a github issue with the details.

    Thanks,
    Ravi
    Hopefully that issue will be addressed in the upcoming release.
    Once i've everything running on the same version i'll check if
    the issue still occurs and reach out, if that's the case.

    Thanks Olaf

    Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey
    <[email protected] <mailto:[email protected]>>:


        Hi,

        I checked the statedump and found some very high memory
        allocations.
        grep -rwn "num_allocs" glusterdump.17317.dump.1605* | cut
        -d'=' -f2 | sort

        30003616
        30003616
        3305
        3305
        36960008
        36960008
        38029944
        38029944
        38450472
        38450472
        39566824
        39566824
        4
        I did check the lines on statedump and it could be happening
        in protocol/clinet. However, I did not find anything
        suspicious in my quick code exploration.
        I would suggest to upgrade all the nodes on latest version
        and the start your work and see if there is any high usage of
        memory .
        That way it will also be easier to debug this issue.

        ---
        Ashish

        ------------------------------------------------------------------------
        *From: *"Olaf Buitelaar" <[email protected]
        <mailto:[email protected]>>
        *To: *"gluster-users" <[email protected]
        <mailto:[email protected]>>
        *Sent: *Thursday, November 19, 2020 10:28:57 PM
        *Subject: *[Gluster-users] possible memory leak in
        client/fuse mount

        Dear Gluster Users,

        I've a glusterfs process which consumes about all memory of
        the machine (~58GB);

        # ps -faxu|grep 17317
        root     17317  3.1 88.9 59695516 58479708 ?   Ssl  Oct31
        839:36 /usr/sbin/glusterfs --process-name fuse
        --volfile-server=10.201.0.1
        --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
        --volfile-id=/docker2 /mnt/docker2

        The gluster version on this machine is 7.8, but i'm currently
        running a mixed cluster of 6.10 and 7.8, while awaiting to
        proceed to upgrade for the issue mentioned earlier with the
        self-heal daemon.

        The affected volume info looks like;

        # gluster v info docker2

        Volume Name: docker2
        Type: Distributed-Replicate
        Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
        Status: Started
        Snapshot Count: 0
        Number of Bricks: 3 x (2 + 1) = 9
        Transport-type: tcp
        Bricks:
        Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
        Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
        Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
        Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
        Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
        Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
        Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
        Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
        Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
        Options Reconfigured:
        performance.cache-size: 128MB
        transport.address-family: inet
        nfs.disable: on
        cluster.brick-multiplex: on

        The issue seems to be triggered by a program called zammad,
        which has an init process, which runs in a loop. on cycle it
        re-compiles the ruby-on-rails application.

        I've attached 2 statedumps, but as i only recently noticed
        the high memory usage, i believe both statedumps already show
        an escalated state of the glusterfs process. If it's needed
        to also have them from the beginning let me know. The dumps
        are taken about an hour apart.
        Also i've included the glusterd.log. I couldn't include
        mnt-docker2.log since it's too large, since it's littered
        with: " I [MSGID: 109066] [dht-rename.c:1951:dht_rename]
        0-docker2-dht"
        However i've inspected the log and it contains no Error
        message's all are of the Info kind;
        which look like these;
        [2020-11-19 03:29:05.406766] I
        [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs: No
        change in volfile,continuing
        [2020-11-19 03:29:21.271886] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-8:
        intentional socket shutdown(5)
        [2020-11-19 03:29:24.479738] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-2:
        intentional socket shutdown(5)
        [2020-11-19 03:30:12.318146] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-5:
        intentional socket shutdown(5)
        [2020-11-19 03:31:27.381720] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-8:
        intentional socket shutdown(5)
        [2020-11-19 03:31:30.579630] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-2:
        intentional socket shutdown(5)
        [2020-11-19 03:32:18.427364] I
        [socket.c:865:__socket_shutdown] 0-docker2-client-5:
        intentional socket shutdown(5)

        The rename messages look like these;
        [2020-11-19 03:29:05.402663] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
        (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
        (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
        ((null)) (hash=docker2-replicate-2/cache=<nul>)
        [2020-11-19 03:29:05.410972] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
        (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
        (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
        ((null)) (hash=docker2-replicate-2/cache=<nul>)
        [2020-11-19 03:29:05.420064] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
        (31f80fcb-977c-433b-9259-5fdfcad1171c)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
        ((null)) (hash=docker2-replicate-0/cache=<nul>)
        [2020-11-19 03:29:05.427537] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
        (e2fdf971-731f-4765-80e8-3165433488ea)
        (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.440576] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
        (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
        (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.452407] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
        (9685b5f3-4b14-4050-9b00-1163856239b5)
        (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
        ((null)) (hash=docker2-replicate-0/cache=<nul>)
        [2020-11-19 03:29:05.460720] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
        (d0a8d0a4-c783-45db-bb4a-68e24044d830)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.468800] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
        (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
        ((null)) (hash=docker2-replicate-0/cache=<nul>)
        [2020-11-19 03:29:05.476745] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
        (17181a40-f9b2-438f-9dfc-7bb159c516e6)
        (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
        ((null)) (hash=docker2-replicate-0/cache=<nul>)
        [2020-11-19 03:29:05.486729] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
        (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.495115] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
        (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.503424] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
        (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
        (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
        ((null)) (hash=docker2-replicate-2/cache=<nul>)
        [2020-11-19 03:29:05.513532] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
        (5a595a65-372d-4377-b547-2c4e23f7be3a)
        (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
        ((null)) (hash=docker2-replicate-0/cache=<nul>)
        [2020-11-19 03:29:05.526885] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
        (2fa99fcd-64f8-4934-aeda-b356816f1132)
        (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
        ((null)) (hash=docker2-replicate-2/cache=<nul>)
        [2020-11-19 03:29:05.537637] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
        (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
        (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
        ((null)) (hash=docker2-replicate-1/cache=<nul>)
        [2020-11-19 03:29:05.547878] I [MSGID: 109066]
        [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
        (b12f041b-5bbd-4e3d-b700-8f673830393f)
        (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
        
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
        ((null)) (hash=docker2-replicate-1/cache=<nul>)

        if i can provide any more information please let me know.

        Thanks Olaf


        ________



        Community Meeting Calendar:

        Schedule -
        Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
        Bridge: https://meet.google.com/cpu-eiue-hvk
        <https://meet.google.com/cpu-eiue-hvk>
        Gluster-users mailing list
        [email protected] <mailto:[email protected]>
        https://lists.gluster.org/mailman/listinfo/gluster-users
        <https://lists.gluster.org/mailman/listinfo/gluster-users>


    ________



    Community Meeting Calendar:

    Schedule -
    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
    Bridge:https://meet.google.com/cpu-eiue-hvk  
<https://meet.google.com/cpu-eiue-hvk>
    Gluster-users mailing list
    [email protected]  <mailto:[email protected]>
    https://lists.gluster.org/mailman/listinfo/gluster-users  
<https://lists.gluster.org/mailman/listinfo/gluster-users>

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to