On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
Hi Ashish,

Thank you for looking into this. I indeed also suspect it has something todo with the 7.X client, because on the 6.X clients the issue doesn't really seem to occur. I would love to update everything to 7.X, But since the self-heal daemons (https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html <https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html>) won't start, i halted the full upgrade.

Olaf, based on your email. I did try to upgrade a 1 node of a 3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I found that the self-heal daemon (and the bricks) came online after I restarted glusterd post-upgrade on that node. (I did not touch the op-version), and I did not spend time on it further.  So I don't think the problem is related to the shd mux changes I referred to. But if you have a test setup where you can reproduce this, please raise a github issue with the details.

Thanks,
Ravi
Hopefully that issue will be addressed in the upcoming release. Once i've everything running on the same version i'll check if the issue still occurs and reach out, if that's the case.

Thanks Olaf

Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey <[email protected] <mailto:[email protected]>>:


    Hi,

    I checked the statedump and found some very high memory allocations.
    grep -rwn "num_allocs" glusterdump.17317.dump.1605* | cut -d'='
    -f2 | sort

    30003616
    30003616
    3305
    3305
    36960008
    36960008
    38029944
    38029944
    38450472
    38450472
    39566824
    39566824
    4
    I did check the lines on statedump and it could be happening in
    protocol/clinet. However, I did not find anything suspicious in my
    quick code exploration.
    I would suggest to upgrade all the nodes on latest version and the
    start your work and see if there is any high usage of memory .
    That way it will also be easier to debug this issue.

    ---
    Ashish

    ------------------------------------------------------------------------
    *From: *"Olaf Buitelaar" <[email protected]
    <mailto:[email protected]>>
    *To: *"gluster-users" <[email protected]
    <mailto:[email protected]>>
    *Sent: *Thursday, November 19, 2020 10:28:57 PM
    *Subject: *[Gluster-users] possible memory leak in client/fuse mount

    Dear Gluster Users,

    I've a glusterfs process which consumes about all memory of the
    machine (~58GB);

    # ps -faxu|grep 17317
    root     17317  3.1 88.9 59695516 58479708 ?   Ssl  Oct31 839:36
    /usr/sbin/glusterfs --process-name fuse
    --volfile-server=10.201.0.1
    --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
    --volfile-id=/docker2 /mnt/docker2

    The gluster version on this machine is 7.8, but i'm currently
    running a mixed cluster of 6.10 and 7.8, while awaiting to proceed
    to upgrade for the issue mentioned earlier with the self-heal daemon.

    The affected volume info looks like;

    # gluster v info docker2

    Volume Name: docker2
    Type: Distributed-Replicate
    Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 3 x (2 + 1) = 9
    Transport-type: tcp
    Bricks:
    Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
    Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
    Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
    Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
    Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
    Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
    Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
    Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
    Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
    Options Reconfigured:
    performance.cache-size: 128MB
    transport.address-family: inet
    nfs.disable: on
    cluster.brick-multiplex: on

    The issue seems to be triggered by a program called zammad, which
    has an init process, which runs in a loop. on cycle it re-compiles
    the ruby-on-rails application.

    I've attached 2 statedumps, but as i only recently noticed the
    high memory usage, i believe both statedumps already show an
    escalated state of the glusterfs process. If it's needed to also
    have them from the beginning let me know. The dumps are taken
    about an hour apart.
    Also i've included the glusterd.log. I couldn't include
    mnt-docker2.log since it's too large, since it's littered with: "
    I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-docker2-dht"
    However i've inspected the log and it contains no Error
    message's all are of the Info kind;
    which look like these;
    [2020-11-19 03:29:05.406766] I
    [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs: No change
    in volfile,continuing
    [2020-11-19 03:29:21.271886] I [socket.c:865:__socket_shutdown]
    0-docker2-client-8: intentional socket shutdown(5)
    [2020-11-19 03:29:24.479738] I [socket.c:865:__socket_shutdown]
    0-docker2-client-2: intentional socket shutdown(5)
    [2020-11-19 03:30:12.318146] I [socket.c:865:__socket_shutdown]
    0-docker2-client-5: intentional socket shutdown(5)
    [2020-11-19 03:31:27.381720] I [socket.c:865:__socket_shutdown]
    0-docker2-client-8: intentional socket shutdown(5)
    [2020-11-19 03:31:30.579630] I [socket.c:865:__socket_shutdown]
    0-docker2-client-2: intentional socket shutdown(5)
    [2020-11-19 03:32:18.427364] I [socket.c:865:__socket_shutdown]
    0-docker2-client-5: intentional socket shutdown(5)

    The rename messages look like these;
    [2020-11-19 03:29:05.402663] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
    (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
    (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
    ((null)) (hash=docker2-replicate-2/cache=<nul>)
    [2020-11-19 03:29:05.410972] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
    (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
    (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
    ((null)) (hash=docker2-replicate-2/cache=<nul>)
    [2020-11-19 03:29:05.420064] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
    (31f80fcb-977c-433b-9259-5fdfcad1171c)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
    ((null)) (hash=docker2-replicate-0/cache=<nul>)
    [2020-11-19 03:29:05.427537] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
    (e2fdf971-731f-4765-80e8-3165433488ea)
    (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.440576] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
    (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
    (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.452407] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
    (9685b5f3-4b14-4050-9b00-1163856239b5)
    (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
    ((null)) (hash=docker2-replicate-0/cache=<nul>)
    [2020-11-19 03:29:05.460720] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
    (d0a8d0a4-c783-45db-bb4a-68e24044d830)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.468800] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
    (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
    ((null)) (hash=docker2-replicate-0/cache=<nul>)
    [2020-11-19 03:29:05.476745] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
    (17181a40-f9b2-438f-9dfc-7bb159c516e6)
    (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
    ((null)) (hash=docker2-replicate-0/cache=<nul>)
    [2020-11-19 03:29:05.486729] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
    (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.495115] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
    (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.503424] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
    (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
    (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
    ((null)) (hash=docker2-replicate-2/cache=<nul>)
    [2020-11-19 03:29:05.513532] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
    (5a595a65-372d-4377-b547-2c4e23f7be3a)
    (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
    ((null)) (hash=docker2-replicate-0/cache=<nul>)
    [2020-11-19 03:29:05.526885] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
    (2fa99fcd-64f8-4934-aeda-b356816f1132)
    (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
    ((null)) (hash=docker2-replicate-2/cache=<nul>)
    [2020-11-19 03:29:05.537637] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
    (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
    (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
    ((null)) (hash=docker2-replicate-1/cache=<nul>)
    [2020-11-19 03:29:05.547878] I [MSGID: 109066]
    [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
    
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
    (b12f041b-5bbd-4e3d-b700-8f673830393f)
    (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
    /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
    ((null)) (hash=docker2-replicate-1/cache=<nul>)

    if i can provide any more information please let me know.

    Thanks Olaf


    ________



    Community Meeting Calendar:

    Schedule -
    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
    Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to