Thanks for all the information. For pstack output, gluster-debuginfo package has to be installed. I will check out the provided information and get back to you.
On Wed, Apr 22, 2020 at 11:54 AM <[email protected]> wrote: > I think all issues are linked with the same underlying problem. > > 1. all peers were in Connected state from every node yesterday but node 2 > is "semi-connected" at now: > root@glusterDevVM1:/var/log/glusterfs# gluster peer status > Number of Peers: 2 > > Hostname: glusterDevVM3 > Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974 > State: Peer in Cluster (Connected) > > Hostname: glusterDevVM2 > Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18 > State: Peer in Cluster (Connected) > root@glusterDevVM2:~# gluster peer status > Number of Peers: 2 > > Hostname: glusterDevVM1 > Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb > State: Peer in Cluster (Disconnected) > > Hostname: glusterDevVM3 > Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974 > State: Peer in Cluster (Connected) > root@glusterDevVM3:~# gluster peer status > Number of Peers: 2 > > Hostname: glusterDevVM2 > Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18 > State: Peer in Cluster (Connected) > > Hostname: glusterDevVM1 > Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb > State: Peer in Cluster (Connected) > 2, 3, 4. a simple gluster volume status show multiple errors on each node: > root@glusterDevVM1:~# gluster volume status tmp > Locking failed on glusterDevVM2. Please check log file for details. > root@glusterDevVM2:~# gluster volume status tmp > Another transaction is in progress for tmp. Please try again after some > time. > root@glusterDevVM3:~# gluster volume status tmp > Error : Request timed out > > Logs for each node (except SSL errors): > root@glusterDevVM1:~# egrep -v '0-socket.management' > /var/log/glusterfs/glusterd.log > [2020-04-22 05:38:32.278618] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7fd28d99fda8] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7fd28d745d97] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7fd28d745ebe] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7fd28d746e93] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7fd28d747a18] ))))) > 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at > 2020-04-22 05:38:32.243087 (xid=0x8d) > [2020-04-22 05:38:32.278638] E [MSGID: 106157] > [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC > Error > [2020-04-22 05:38:32.278651] I [MSGID: 106493] > [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: > Received RJT from uuid: 00000000-0000-0000-0000-000000000000 > [2020-04-22 05:38:33.256401] I [MSGID: 106498] > [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management: > not connected yet > [2020-04-22 05:38:35.279149] I [socket.c:4347:ssl_setup_connection_params] > 0-management: SSL support on the I/O path is ENABLED > [2020-04-22 05:38:35.279169] I [socket.c:4350:ssl_setup_connection_params] > 0-management: SSL support for glusterd is ENABLED > [2020-04-22 05:38:35.279178] I [socket.c:4360:ssl_setup_connection_params] > 0-management: using certificate depth 1 > The message "I [MSGID: 106004] > [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer > <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in state <Peer in > Cluster>, has disconnected from glusterd." repeated 3 times between > [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153] > [2020-04-22 05:38:35.667255] W > [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) > [0x7fd287da7119] > -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) > [0x7fd287db1aae] > -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) > [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held > [2020-04-22 05:38:35.667275] W [MSGID: 106117] > [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not > released for <vol> > 2 last lines repeated for each volume > > root@glusterDevVM2:~# egrep -v '0-socket.management' > /var/log/glusterfs/glusterd.log > [2020-04-22 05:51:57.493574] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f30411dbda8] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7f3040f81d97] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7f3040f81ebe] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7f3040f82e93] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7f3040f83a18] ))))) > 0-management: forced unwinding frame type(Gluster MGMT Handshake) > op(MGMT-VERS(1)) called at 2020-04-22 05:51:57.483579 (xid=0x563) > [2020-04-22 05:51:57.493623] E [MSGID: 106167] > [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management: > Error through RPC layer, retry again later > [2020-04-22 05:52:00.501474] I [socket.c:4347:ssl_setup_connection_params] > 0-management: SSL support on the I/O path is ENABLED > [2020-04-22 05:52:00.501542] I [socket.c:4350:ssl_setup_connection_params] > 0-management: SSL support for glusterd is ENABLED > [2020-04-22 05:52:00.501569] I [socket.c:4360:ssl_setup_connection_params] > 0-management: using certificate depth 1 > [2020-04-22 05:52:00.983720] I [MSGID: 106004] > [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer > <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in state <Peer in > Cluster>, has disconnected from glusterd. > [2020-04-22 05:52:00.983886] W > [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) > [0x7f303b5e3119] > -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) > [0x7f303b5edaae] > -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) > [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held > [2020-04-22 05:52:00.983909] W [MSGID: 106117] > [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not > released for <vol> > 2 last lines repeated for each volume > > root@glusterDevVM3:~# egrep -v '0-socket.management' > /var/log/glusterfs/glusterd.log > [2020-04-22 05:38:33.229959] I [MSGID: 106499] > [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume tmp > [2020-04-22 05:41:33.230170] I > [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock > timer is cancelled for volume_type tmp_vol > [2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management: > bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique = > 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007 > [2020-04-22 05:48:34.908339] E [MSGID: 106115] > [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking > failed on glusterDevVM1. Please check log file for details. > [2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management: > bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique = > 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007 > [2020-04-22 05:48:40.288568] E [MSGID: 106115] > [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking > failed on glusterDevVM2. Please check log file for details. > [2020-04-22 05:48:40.288631] E [MSGID: 106150] > [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers > Failed. > > I'm not familiar with pstack, when running on node 3 (arbiter) I get only > these few lines : > root@glusterDevVM3:~# pstack 13700 > > 13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO > (No symbols found) > 0x7fafd747a6cd: ???? > > Which debian stretch package should I install ? > > To be more explicit, I stopped glusterd on all 3 nodes then restart > sequentially with this order : node1, node3 (arbiter) then node2. > Log files can be dl at > https://www.dropbox.com/s/rcgcw7jrud2wkj1/glusterd-logs.tar.bz2?dl=0 > > Thanks for your help. > > ------------------------------ > *De: *"Sanju Rakonde" <[email protected]> > *À: *[email protected] > *Cc: *"gluster-users" <[email protected]> > *Envoyé: *Mercredi 22 Avril 2020 07:23:52 > *Objet: *Re: [Gluster-users] never ending logging > > Hi, > The email is talking about many issues. Let me ask a few questions to get > a whole picture. > 1. are the peers are in the connected state now? or they still in the > rejected state? > 2. What led you to see "locking failed" messages? We would like to if > there is a reproducer and fix the issue if any. > 3. Another transaction in progress message appears when there is already a > operation going on. Are you seeing this when there is no such transaction > going on? > 4. When did you hit the timedouts? Did you tried to look at the pstack > output of glusterd process? If so, please share the pstack output. > > On Tue, Apr 21, 2020 at 7:08 PM <[email protected]> wrote: > >> Hi all. >> >> We're using 3 nodes Gluster 7.3 (2 + 1 arbiter), yesterday node 2 was >> rejected from cluster and I applied following steps to fix : >> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/ >> I also saw >> https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/ >> but solution isn't compatible as cluster.max-op-version doesn't exist and >> all op-version are the same on all 3 nodes. >> >> After renewing SSL certs and several restart all volumes came back online >> but glusterd log file on all 3 nodes is filled with nothing else than >> following 3 lines : >> >> [2020-04-21 13:05:19.478913] I >> [socket.c:4347:ssl_setup_connection_params] 0-socket.management: SSL >> support on the I/O path is ENABLED >> [2020-04-21 13:05:19.478972] I >> [socket.c:4350:ssl_setup_connection_params] 0-socket.management: SSL >> support for glusterd is ENABLED >> [2020-04-21 13:05:19.478986] I >> [socket.c:4360:ssl_setup_connection_params] 0-socket.management: using >> certificate depth 1 >> >> Moreover, I have "Locking failed", "Another transaction is in progress" >> and "Error : Request timed out" on gluster volume status volxxx command. >> All SSL certs on clients have also been renewed and all volumes were >> remounted. All 3 nodes were alternatively restarted (glusterd) and rebooted. >> >> The cluster is not in production environment but there's about ~250 >> clients for ~75 volumes, I don't know how to troubleshoot and fix this >> problem, if anyone has an idea. >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Thanks, > Sanju > > -- Thanks, Sanju
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
