Hi Mehmet, Yes, I only see the "RPC: Could not send backchannel reply error: -110" message on the NFS client side.
Thanks for opening the new bug report #2084023. Up to know I have 22 NFS Crashs on our productive Linux Cluster and some self-generated Crashs on my Test Setup. If you need more information please let me know. I would be very happy if I can help to eliminate the problem as soon as possible. I wrote some Python scripts to reproduce the crash in my test setup, but most of the time it took too long. Do you know what I have to do in addition to the high IO load and many processes to reproduce the problem? I also need this process for approval with us. I'm currently trying to find a stable workaround. Best Regards Peter Schubert -----Ursprüngliche Nachricht----- Von: nore...@launchpad.net <nore...@launchpad.net> Im Auftrag von Mehmet Basaran Gesendet: Mittwoch, 9. Oktober 2024 12:09 An: schub...@iap-kborn.de Betreff: [Bug 2062568] Re: nfsd gets unresponsive after some hours of operation Hi Peter, Thanks for trying out the nfs patch. However, "RPC: Could not send backchannel reply error: -110" I suspected this problem to be on the nfs server side rather than the nfs client. Were tou having this error on the client side before trying out the patched nfs kernel? "rcu: INFO: rcu_sched self-detected stall on CPU" seems to be a different issue and I found a possible fix for this and plan on releasing a test kernel. But, since this is a different issue, let's continue the discussion here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2084023 -- You received this bug notification because you are subscribed to a duplicate bug report (2083502). https://bugs.launchpad.net/bugs/2062568 Title: nfsd gets unresponsive after some hours of operation Status in linux package in Ubuntu: In Progress Status in nfs-utils package in Ubuntu: Incomplete Status in linux source package in Noble: In Progress Status in nfs-utils source package in Noble: Incomplete Bug description: I installed the 24.04 Beta on two test machines that were running 22.04 without issues before. One of them exports two volumes that are mounted by the other machine, which primarily uses them as a secondary storage for ccache. After being up for a couple of hours (happened twice since yesterday evening) it seems that nfsd on the machine exporting the volumes hangs on something. From dmesg on the server (repeated a few times): [11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds. [11183.290558] Not tainted 6.8.0-22-generic #22-Ubuntu [11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [11183.290582] task:nfsd state:D stack:0 pid:1419 tgid:1419 ppid:2 flags:0x00004000 [11183.290587] Call Trace: [11183.290602] <TASK> [11183.290606] __schedule+0x27c/0x6b0 [11183.290612] schedule+0x33/0x110 [11183.290615] schedule_timeout+0x157/0x170 [11183.290619] wait_for_completion+0x88/0x150 [11183.290623] __flush_workqueue+0x140/0x3e0 [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd] [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd] [11183.290744] nfsd4_proc_compound+0x3af/0x770 [nfsd] [11183.290798] nfsd_dispatch+0xd4/0x220 [nfsd] [11183.290851] svc_process_common+0x44d/0x710 [sunrpc] [11183.290924] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] [11183.290976] svc_process+0x132/0x1b0 [sunrpc] [11183.291041] svc_handle_xprt+0x4d3/0x5d0 [sunrpc] [11183.291105] svc_recv+0x18b/0x2e0 [sunrpc] [11183.291168] ? __pfx_nfsd+0x10/0x10 [nfsd] [11183.291220] nfsd+0x8b/0xe0 [nfsd] [11183.291270] kthread+0xef/0x120 [11183.291274] ? __pfx_kthread+0x10/0x10 [11183.291276] ret_from_fork+0x44/0x70 [11183.291279] ? __pfx_kthread+0x10/0x10 [11183.291281] ret_from_fork_asm+0x1b/0x30 [11183.291286] </TASK> From dmesg on the client (repeated a number of times): [ 6596.911785] RPC: Could not send backchannel reply error: -110 [ 6596.972490] RPC: Could not send backchannel reply error: -110 [ 6837.281307] RPC: Could not send backchannel reply error: -110 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: nfs-kernel-server 1:2.6.4-3ubuntu5 ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1 Uname: Linux 6.8.0-22-generic x86_64 .etc.request-key.d.id_resolver.conf: create id_resolver * * /usr/sbin/nfsidmap -t 600 %k %d ApportVersion: 2.28.1-0ubuntu1 Architecture: amd64 CasperMD5CheckResult: pass Date: Fri Apr 19 14:10:25 2024 InstallationDate: Installed on 2024-04-16 (3 days ago) InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1) NFSMounts: NFSv4Mounts: ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> SourcePackage: nfs-utils UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2062568/+subscriptions -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2062568 Title: nfsd gets unresponsive after some hours of operation Status in linux package in Ubuntu: In Progress Status in nfs-utils package in Ubuntu: Incomplete Status in linux source package in Noble: In Progress Status in nfs-utils source package in Noble: Incomplete Bug description: I installed the 24.04 Beta on two test machines that were running 22.04 without issues before. One of them exports two volumes that are mounted by the other machine, which primarily uses them as a secondary storage for ccache. After being up for a couple of hours (happened twice since yesterday evening) it seems that nfsd on the machine exporting the volumes hangs on something. From dmesg on the server (repeated a few times): [11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds. [11183.290558] Not tainted 6.8.0-22-generic #22-Ubuntu [11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [11183.290582] task:nfsd state:D stack:0 pid:1419 tgid:1419 ppid:2 flags:0x00004000 [11183.290587] Call Trace: [11183.290602] <TASK> [11183.290606] __schedule+0x27c/0x6b0 [11183.290612] schedule+0x33/0x110 [11183.290615] schedule_timeout+0x157/0x170 [11183.290619] wait_for_completion+0x88/0x150 [11183.290623] __flush_workqueue+0x140/0x3e0 [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd] [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd] [11183.290744] nfsd4_proc_compound+0x3af/0x770 [nfsd] [11183.290798] nfsd_dispatch+0xd4/0x220 [nfsd] [11183.290851] svc_process_common+0x44d/0x710 [sunrpc] [11183.290924] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] [11183.290976] svc_process+0x132/0x1b0 [sunrpc] [11183.291041] svc_handle_xprt+0x4d3/0x5d0 [sunrpc] [11183.291105] svc_recv+0x18b/0x2e0 [sunrpc] [11183.291168] ? __pfx_nfsd+0x10/0x10 [nfsd] [11183.291220] nfsd+0x8b/0xe0 [nfsd] [11183.291270] kthread+0xef/0x120 [11183.291274] ? __pfx_kthread+0x10/0x10 [11183.291276] ret_from_fork+0x44/0x70 [11183.291279] ? __pfx_kthread+0x10/0x10 [11183.291281] ret_from_fork_asm+0x1b/0x30 [11183.291286] </TASK> From dmesg on the client (repeated a number of times): [ 6596.911785] RPC: Could not send backchannel reply error: -110 [ 6596.972490] RPC: Could not send backchannel reply error: -110 [ 6837.281307] RPC: Could not send backchannel reply error: -110 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: nfs-kernel-server 1:2.6.4-3ubuntu5 ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1 Uname: Linux 6.8.0-22-generic x86_64 .etc.request-key.d.id_resolver.conf: create id_resolver * * /usr/sbin/nfsidmap -t 600 %k %d ApportVersion: 2.28.1-0ubuntu1 Architecture: amd64 CasperMD5CheckResult: pass Date: Fri Apr 19 14:10:25 2024 InstallationDate: Installed on 2024-04-16 (3 days ago) InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1) NFSMounts: NFSv4Mounts: ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> SourcePackage: nfs-utils UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2062568/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp