On 26/11/20 4:00 pm, Olaf Buitelaar wrote:
Hi Ravi,

I could try that, but i can only try a setup on VM's, and will not be able to setup an environment like our production environment. Which runs on physical machines, and has actual production load etc. So the 2 setups would be quite different. Personally i think it would be best debug the actual machines instead of trying to reproduce it. Since the reproduction of the issue on the physical machines is just swap the repositories and upgrade the packages.
Let me know what you think?

Physical machines or VMs - anything is fine. The only thing is I cannot guarantee quick responses , so if it is a production machine, it will be an issue for you. So any set up you can use for experimenting is fine. You don't need any clients for the testing. Just create a 1x2  replica volume using 2 nodes and start it. Then upgrade one node and see if shd and bricks come up on that node.

-Ravi


Thanks Olaf

Op do 26 nov. 2020 om 02:43 schreef Ravishankar N <[email protected] <mailto:[email protected]>>:


    On 25/11/20 7:17 pm, Olaf Buitelaar wrote:
    Hi Ravi,

    Thanks for checking. Unfortunately this is our production system,
    what i've done is simple change the yum repo from gluter-6 to
    http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/
    <http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/>.
    Did a yum upgrade. And did restart the glusterd process
    several times, i've also tried rebooting the machine. And didn't
    touch the op-version yet, which is still at (60000), usually i
    only do this when all nodes are upgraded, and are running stable.
    We're running multiple volumes with different configurations, but
    for none of the volumes the shd starts on the upgraded nodes.
    Is there anything further i could check/do to get to the bottom
    of this?

    Hi Olaf, like I said, would it be possible to create a test setup
    to see if you can recreate it?

    Regards,
    Ravi

    Thanks Olaf

    Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N
    <[email protected] <mailto:[email protected]>>:


        On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
        Hi Ashish,

        Thank you for looking into this. I indeed also suspect it
        has something todo with the 7.X client, because on the 6.X
        clients the issue doesn't really seem to occur.
        I would love to update everything to 7.X, But since the
        self-heal daemons
        
(https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html
        
<https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html>)
        won't start, i halted the full upgrade.

        Olaf, based on your email. I did try to upgrade a 1 node of a
        3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I
        found that the self-heal daemon (and the bricks) came online
        after I restarted glusterd post-upgrade on that node. (I did
        not touch the op-version), and I did not spend time on it
        further.  So I don't think the problem is related to the shd
        mux changes I referred to. But if you have a test setup where
        you can reproduce this, please raise a github issue with the
        details.

        Thanks,
        Ravi
        Hopefully that issue will be addressed in the upcoming
        release. Once i've everything running on the same version
        i'll check if the issue still occurs and reach out, if
        that's the case.

        Thanks Olaf

        Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey
        <[email protected] <mailto:[email protected]>>:


            Hi,

            I checked the statedump and found some very high memory
            allocations.
            grep -rwn "num_allocs" glusterdump.17317.dump.1605* |
            cut -d'=' -f2 | sort

            30003616
            30003616
            3305
            3305
            36960008
            36960008
            38029944
            38029944
            38450472
            38450472
            39566824
            39566824
            4
            I did check the lines on statedump and it could be
            happening in protocol/clinet. However, I did not find
            anything suspicious in my quick code exploration.
            I would suggest to upgrade all the nodes on latest
            version and the start your work and see if there is any
            high usage of memory .
            That way it will also be easier to debug this issue.

            ---
            Ashish

            
------------------------------------------------------------------------
            *From: *"Olaf Buitelaar" <[email protected]
            <mailto:[email protected]>>
            *To: *"gluster-users" <[email protected]
            <mailto:[email protected]>>
            *Sent: *Thursday, November 19, 2020 10:28:57 PM
            *Subject: *[Gluster-users] possible memory leak in
            client/fuse mount

            Dear Gluster Users,

            I've a glusterfs process which consumes about all memory
            of the machine (~58GB);

            # ps -faxu|grep 17317
            root     17317  3.1 88.9 59695516 58479708 ?   Ssl
             Oct31 839:36 /usr/sbin/glusterfs --process-name fuse
            --volfile-server=10.201.0.1
            
--volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
            --volfile-id=/docker2 /mnt/docker2

            The gluster version on this machine is 7.8, but i'm
            currently running a mixed cluster of 6.10 and 7.8, while
            awaiting to proceed to upgrade for the issue mentioned
            earlier with the self-heal daemon.

            The affected volume info looks like;

            # gluster v info docker2

            Volume Name: docker2
            Type: Distributed-Replicate
            Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
            Status: Started
            Snapshot Count: 0
            Number of Bricks: 3 x (2 + 1) = 9
            Transport-type: tcp
            Bricks:
            Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
            Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
            Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2
            (arbiter)
            Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
            Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
            Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2
            (arbiter)
            Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
            Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
            Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2
            (arbiter)
            Options Reconfigured:
            performance.cache-size: 128MB
            transport.address-family: inet
            nfs.disable: on
            cluster.brick-multiplex: on

            The issue seems to be triggered by a program called
            zammad, which has an init process, which runs in a loop.
            on cycle it re-compiles the ruby-on-rails application.

            I've attached 2 statedumps, but as i only recently
            noticed the high memory usage, i believe both
            statedumps already show an escalated state of the
            glusterfs process. If it's needed to also have them from
            the beginning let me know. The dumps are taken about an
            hour apart.
            Also i've included the glusterd.log. I couldn't include
            mnt-docker2.log since it's too large, since it's
            littered with: " I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht"
            However i've inspected the log and it contains no Error
            message's all are of the Info kind;
            which look like these;
            [2020-11-19 03:29:05.406766] I
            [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs:
            No change in volfile,continuing
            [2020-11-19 03:29:21.271886] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-8:
            intentional socket shutdown(5)
            [2020-11-19 03:29:24.479738] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-2:
            intentional socket shutdown(5)
            [2020-11-19 03:30:12.318146] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-5:
            intentional socket shutdown(5)
            [2020-11-19 03:31:27.381720] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-8:
            intentional socket shutdown(5)
            [2020-11-19 03:31:30.579630] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-2:
            intentional socket shutdown(5)
            [2020-11-19 03:32:18.427364] I
            [socket.c:865:__socket_shutdown] 0-docker2-client-5:
            intentional socket shutdown(5)

            The rename messages look like these;
            [2020-11-19 03:29:05.402663] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
            (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
            (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
            ((null)) (hash=docker2-replicate-2/cache=<nul>)
            [2020-11-19 03:29:05.410972] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
            (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
            (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
            ((null)) (hash=docker2-replicate-2/cache=<nul>)
            [2020-11-19 03:29:05.420064] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
            (31f80fcb-977c-433b-9259-5fdfcad1171c)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
            ((null)) (hash=docker2-replicate-0/cache=<nul>)
            [2020-11-19 03:29:05.427537] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
            (e2fdf971-731f-4765-80e8-3165433488ea)
            (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.440576] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
            (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
            (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.452407] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
            (9685b5f3-4b14-4050-9b00-1163856239b5)
            (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
            ((null)) (hash=docker2-replicate-0/cache=<nul>)
            [2020-11-19 03:29:05.460720] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
            (d0a8d0a4-c783-45db-bb4a-68e24044d830)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.468800] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
            (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
            ((null)) (hash=docker2-replicate-0/cache=<nul>)
            [2020-11-19 03:29:05.476745] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
            (17181a40-f9b2-438f-9dfc-7bb159c516e6)
            (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
            ((null)) (hash=docker2-replicate-0/cache=<nul>)
            [2020-11-19 03:29:05.486729] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
            (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.495115] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
            (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.503424] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
            (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
            (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
            ((null)) (hash=docker2-replicate-2/cache=<nul>)
            [2020-11-19 03:29:05.513532] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
            (5a595a65-372d-4377-b547-2c4e23f7be3a)
            (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
            ((null)) (hash=docker2-replicate-0/cache=<nul>)
            [2020-11-19 03:29:05.526885] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
            (2fa99fcd-64f8-4934-aeda-b356816f1132)
            (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
            ((null)) (hash=docker2-replicate-2/cache=<nul>)
            [2020-11-19 03:29:05.537637] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
            (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
            (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
            ((null)) (hash=docker2-replicate-1/cache=<nul>)
            [2020-11-19 03:29:05.547878] I [MSGID: 109066]
            [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
            (b12f041b-5bbd-4e3d-b700-8f673830393f)
            (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
            
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
            ((null)) (hash=docker2-replicate-1/cache=<nul>)

            if i can provide any more information please let me know.

            Thanks Olaf


            ________



            Community Meeting Calendar:

            Schedule -
            Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
            Bridge: https://meet.google.com/cpu-eiue-hvk
            <https://meet.google.com/cpu-eiue-hvk>
            Gluster-users mailing list
            [email protected] <mailto:[email protected]>
            https://lists.gluster.org/mailman/listinfo/gluster-users
            <https://lists.gluster.org/mailman/listinfo/gluster-users>


        ________



        Community Meeting Calendar:

        Schedule -
        Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
        Bridge:https://meet.google.com/cpu-eiue-hvk  
<https://meet.google.com/cpu-eiue-hvk>
        Gluster-users mailing list
        [email protected]  <mailto:[email protected]>
        https://lists.gluster.org/mailman/listinfo/gluster-users  
<https://lists.gluster.org/mailman/listinfo/gluster-users>

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to