[ceph-users] Re: "ceph orch daemon add osd" deploys broken OSD
Hello everyone, any ideas? Even small hints would help a lot! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state
did it help? Maybe you found a better solution? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state
Well, I've replaced the failed drives and that cleared the error. Arguably, it was a better solution :-) /Z On Sat, 6 Apr 2024 at 10:13, wrote: > did it help? Maybe you found a better solution? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue about execute "ceph fs new"
Thanks for your information, I tried to new some mds pods, but it seems the same issue. [root@vm-01 examples]# cat filesystem.yaml | grep activeCount activeCount: 3 [root@vm-01 examples]# [root@vm-01 examples]# kubectl get pod -nrook-ceph | grep mds rook-ceph-mds-myfs-a-6d46fcfd4c-lxc8m 2/2 Running 0 11m rook-ceph-mds-myfs-b-755685bcfb-mnfbv 2/2 Running 0 11m rook-ceph-mds-myfs-c-75c78b68bf-h5m9b 2/2 Running 0 9m13s rook-ceph-mds-myfs-d-6b595c4c98-tq6rl 2/2 Running 0 9m12s rook-ceph-mds-myfs-e-5dbfb9445f-4hbrn 2/2 Running 0 117s rook-ceph-mds-myfs-f-7957c55bc6-xtczr 2/2 Running 0 116s [root@vm-01 examples]# [root@vm-01 examples]# kubectl exec -it `kubectl get pod -nrook-ceph | grep tools | awk -F ' ' '{print $1}'` -n rook-ceph bash kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. bash-4.4$ bash-4.4$ ceph -s cluster: id: de9af3fe-d3b1-4a4b-bf61-929a990295f6 health: HEALTH_ERR 1 filesystem is offline 1 filesystem is online with fewer MDS than max_mds services: mon: 3 daemons, quorum a,b,d (age 74m) mgr: a(active, since 5d), standbys: b mds: 3/3 daemons up, 3 hot standby osd: 3 osds: 3 up (since 84m), 3 in (since 6d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 2/2 healthy pools: 14 pools, 233 pgs objects: 633 objects, 450 MiB usage: 2.0 GiB used, 208 GiB / 210 GiB avail pgs: 233 active+clean io: client: 19 KiB/s rd, 0 B/s wr, 21 op/s rd, 10 op/s wr bash-4.4$ bash-4.4$ bash-4.4$ ceph health detail HEALTH_ERR 1 filesystem is offline; 1 filesystem is online with fewer MDS than max_mds [ERR] MDS_ALL_DOWN: 1 filesystem is offline fs kingcephfs is offline because no MDS is active for it. [WRN] MDS_UP_LESS_THAN_MAX: 1 filesystem is online with fewer MDS than max_mds fs kingcephfs has 0 MDS online, but wants 1 bash-4.4$ bash-4.4$ bash-4.4$ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue about execute "ceph fs new"
I tried to remove the default fs then it works, but port 6789 still not able to telnet. ceph fs fail myfs ceph fs rm myfs --yes-i-really-mean-it bash-4.4$ bash-4.4$ ceph fs ls name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: [cephfs-king-data ] bash-4.4$ bash-4.4$ bash-4.4$ ceph -s cluster: id: de9af3fe-d3b1-4a4b-bf61-929a990295f6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,d (age 90m) mgr: a(active, since 5d), standbys: b mds: 1/1 daemons up, 5 standby osd: 3 osds: 3 up (since 100m), 3 in (since 6d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 14 pools, 233 pgs objects: 633 objects, 450 MiB usage: 2.0 GiB used, 208 GiB / 210 GiB avail pgs: 233 active+clean bash-4.4$ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] question regarding access cephFS from external network.
Dears , I have a question about cephfs port 6789 is with type of ClusterIP, then how can i access it from external network? [root@vm-01 examples]# kubectl get svc -nrook-ceph NAME TYPECLUSTER-IP EXTERNAL-IP PORT(S) AGE rook-ceph-mgrClusterIP 10.106.49.230 9283/TCP6d23h rook-ceph-mgr-dashboard ClusterIP 10.96.37.100 8443/TCP6d23h rook-ceph-mgr-dashboard-external-https NodePort10.108.78.191 8443:31082/TCP 6d21h rook-ceph-mon-a ClusterIP 10.107.62.113 6789/TCP,3300/TCP 3h12m rook-ceph-mon-b ClusterIP 10.103.94.71 6789/TCP,3300/TCP 3h12m rook-ceph-mon-d ClusterIP 10.110.210.113 6789/TCP,3300/TCP 3h1m rook-ceph-rgw-my-store ClusterIP 10.98.150.97 80/TCP 4d6h rook-ceph-rgw-my-store-external NodePort10.111.249.203 80:30514/TCP4d5h ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Impact of Slow OPS?
Hi, Do slow ops impact data integrity => No Can I generally ignore it => No :) This means that some client transactions are blocked for 120 sec (that's a lot). This could be a lock on the client side (CephFS, essentially), an incident on the infrastructure side (a disk about to fall, network instability, etc.), ... When this happens, you need to look at the blocked requests. If you systematically see an osd ID, then look at dmesg and the SMART of the disk. This can also be an architectural problem (for example, high IOPS load with osdmap on HDD, all multiplied by the erasure code) *David* Le ven. 5 avr. 2024 à 19:42, adam.ther a écrit : > Hello, > > Do slow ops impact data integrity or can I generally ignore it? I'm > loading 3 hosts with a 10GB link and it saturating the disks or the OSDs. > > 2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check > update: 3 slow ops, oldest one blocked for 117 sec, daemons > [osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. > (SLOW_OPS) > > 2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check > update: 2 slow ops, oldest one blocked for 123 sec, daemons > [osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS) > > I guess more to the point, what the impact here? > > Thanks, > > Adam > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] NFS never recovers after slow ops
Hi Cephadm Reef 18.2.1 Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host. Even with osd_max_backfills at 1 the OSDs get slow ops from time to time which seems odd as we recently did a huge reshuffle[1] involving the same host without seeing these slow ops. I guess one difference is the disks were then only getting writes when they were added and now they are only being used for reads as they are being drained. The slow ops eventually go away but I'm seeing stuck nfsd threads from RBD exports lingering on forever. I have to reboot the NFS server to get it going again, restarting nfs-server also just hangs. Here's a stack trace from dmesg: " [Sat Apr 6 17:44:52 2024] INFO: task nfsd:52502 blocked for more than 1245 seconds. [Sat Apr 6 17:44:52 2024] Not tainted 5.14.0-362.8.1.test2.el9_3.x86_64 #1 [Sat Apr 6 17:44:52 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Apr 6 17:44:52 2024] task:nfsdstate:D stack:0 pid:52502 ppid:2 flags:0x4000 [Sat Apr 6 17:44:52 2024] Call Trace: [Sat Apr 6 17:44:52 2024] [Sat Apr 6 17:44:52 2024] __schedule+0x20a/0x550 [Sat Apr 6 17:44:52 2024] schedule+0x2d/0x70 [Sat Apr 6 17:44:52 2024] schedule_timeout+0x11f/0x160 [Sat Apr 6 17:44:52 2024] ? xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop.0+0x9a/0xd0 [xfs] [Sat Apr 6 17:44:52 2024] __down_common+0x11f/0x200 [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop.0+0x30/0xd0 [xfs] [Sat Apr 6 17:44:52 2024] down+0x43/0x60 [Sat Apr 6 17:44:52 2024] xfs_buf_lock+0x2d/0xe0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_find_lock+0x45/0xf0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_get_map+0xc1/0x3a0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_read_map+0x54/0x290 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_lookup+0x173/0x1d0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget_cache_miss+0xa2/0x370 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget+0x19f/0x270 [xfs] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [Sat Apr 6 17:44:52 2024] exportfs_decode_fh_raw+0x60/0x2e0 [Sat Apr 6 17:44:52 2024] ? exp_find_key+0x99/0x1e0 [nfsd] [Sat Apr 6 17:44:52 2024] ? rcu_nocb_try_bypass+0x4d/0x440 [Sat Apr 6 17:44:52 2024] ? __kmalloc+0x19b/0x370 [Sat Apr 6 17:44:52 2024] ? __pfx_put_cred_rcu+0x10/0x10 [Sat Apr 6 17:44:52 2024] ? call_rcu+0x114/0x310 [Sat Apr 6 17:44:52 2024] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [Sat Apr 6 17:44:52 2024] fh_verify+0x1b3/0x2f0 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_putfh+0x3e/0x70 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_proc_compound+0x44e/0x700 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd_dispatch+0x53/0x170 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process_common+0x357/0x640 [sunrpc] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process+0x12d/0x180 [sunrpc] [Sat Apr 6 17:44:52 2024] nfsd+0xd5/0x190 [nfsd] [Sat Apr 6 17:44:52 2024] kthread+0xe0/0x100 [Sat Apr 6 17:44:52 2024] ? __pfx_kthread+0x10/0x10 [Sat Apr 6 17:44:52 2024] ret_from_fork+0x2c/0x50 [Sat Apr 6 17:44:52 2024] " Stack: " [root@cogsworth ~]# cat /proc/52502/stack [<0>] xfs_buf_lock+0x2d/0xe0 [xfs] [<0>] xfs_buf_find_lock+0x45/0xf0 [xfs] [<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs] [<0>] xfs_buf_read_map+0x54/0x290 [xfs] [<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs] [<0>] xfs_imap_to_bp+0x4e/0x70 [xfs] [<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs] [<0>] xfs_iget+0x19f/0x270 [xfs] [<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [<0>] exportfs_decode_fh_raw+0x60/0x2e0 [<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [<0>] fh_verify+0x1b3/0x2f0 [nfsd] [<0>] nfsd4_putfh+0x3e/0x70 [nfsd] [<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd] [<0>] nfsd_dispatch+0x53/0x170 [nfsd] [<0>] svc_process_common+0x357/0x640 [sunrpc] [<0>] svc_process+0x12d/0x180 [sunrpc] [<0>] nfsd+0xd5/0x190 [nfsd] [<0>] kthread+0xe0/0x100 [<0>] ret_from_fork+0x2c/0x50 " The nfsd treads do not recover even with nobackfill set, so the cluster is essential idle: " [root@lazy ~]# ceph -s cluster: id: X health: HEALTH_ERR nobackfill,noscrub,nodeep-scrub flag(s) set 1 scrub errors Possible data damage: 1 pg inconsistent 631 pgs not deep-scrubbed in time
[ceph-users] Re: Issue about execute "ceph fs new"
Did you enable multi-active MDS? Can you please share 'ceph fs dump'? Port 6789 is the MON port (v1, v2 is 3300). If you haven't enabled multi-active, run: ceph fs flag set enable_multiple Zitat von elite_...@163.com: I tried to remove the default fs then it works, but port 6789 still not able to telnet. ceph fs fail myfs ceph fs rm myfs --yes-i-really-mean-it bash-4.4$ bash-4.4$ ceph fs ls name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: [cephfs-king-data ] bash-4.4$ bash-4.4$ bash-4.4$ ceph -s cluster: id: de9af3fe-d3b1-4a4b-bf61-929a990295f6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,d (age 90m) mgr: a(active, since 5d), standbys: b mds: 1/1 daemons up, 5 standby osd: 3 osds: 3 up (since 100m), 3 in (since 6d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 14 pools, 233 pgs objects: 633 objects, 450 MiB usage: 2.0 GiB used, 208 GiB / 210 GiB avail pgs: 233 active+clean bash-4.4$ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue about execute "ceph fs new"
Sorry, I hit send too early, to enable multi-active MDS the full command is: ceph fs flag set enable_multiple true Zitat von Eugen Block : Did you enable multi-active MDS? Can you please share 'ceph fs dump'? Port 6789 is the MON port (v1, v2 is 3300). If you haven't enabled multi-active, run: ceph fs flag set enable_multiple Zitat von elite_...@163.com: I tried to remove the default fs then it works, but port 6789 still not able to telnet. ceph fs fail myfs ceph fs rm myfs --yes-i-really-mean-it bash-4.4$ bash-4.4$ ceph fs ls name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: [cephfs-king-data ] bash-4.4$ bash-4.4$ bash-4.4$ ceph -s cluster: id: de9af3fe-d3b1-4a4b-bf61-929a990295f6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,d (age 90m) mgr: a(active, since 5d), standbys: b mds: 1/1 daemons up, 5 standby osd: 3 osds: 3 up (since 100m), 3 in (since 6d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 14 pools, 233 pgs objects: 633 objects, 450 MiB usage: 2.0 GiB used, 208 GiB / 210 GiB avail pgs: 233 active+clean bash-4.4$ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Impact of Slow OPS?
ISTR that the Ceph slow op threshold defaults to 30 or 32 seconds. Naturally an op over the threshold often means there are more below the reporting threshold. 120s I think is the default Linux op timeout. > On Apr 6, 2024, at 10:53 AM, David C. wrote: > > Hi, > > Do slow ops impact data integrity => No > Can I generally ignore it => No :) > > This means that some client transactions are blocked for 120 sec (that's a > lot). > This could be a lock on the client side (CephFS, essentially), an incident > on the infrastructure side (a disk about to fall, network instability, > etc.), ... > > When this happens, you need to look at the blocked requests. > If you systematically see an osd ID, then look at dmesg and the SMART of > the disk. > > This can also be an architectural problem (for example, high IOPS load with > osdmap on HDD, all multiplied by the erasure code) > > *David* > > >> Le ven. 5 avr. 2024 à 19:42, adam.ther a écrit : >> >> Hello, >> >> Do slow ops impact data integrity or can I generally ignore it? I'm >> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs. >> >>2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check >>update: 3 slow ops, oldest one blocked for 117 sec, daemons >>[osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. >> (SLOW_OPS) >> >>2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check >>update: 2 slow ops, oldest one blocked for 123 sec, daemons >>[osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS) >> >> I guess more to the point, what the impact here? >> >> Thanks, >> >> Adam >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: NFS never recovers after slow ops
On 06-04-2024 18:10, Torkil Svensgaard wrote: Hi Cephadm Reef 18.2.1 Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host. Even with osd_max_backfills at 1 the OSDs get slow ops from time to time which seems odd as we recently did a huge reshuffle[1] involving the same host without seeing these slow ops. I guess one difference is the disks were then only getting writes when they were added and now they are only being used for reads as they are being drained. The slow ops eventually go away but I'm seeing stuck nfsd threads from RBD exports lingering on forever. I have to reboot the NFS server to get it going again, restarting nfs-server also just hangs. Here's a stack trace from dmesg: " [Sat Apr 6 17:44:52 2024] INFO: task nfsd:52502 blocked for more than 1245 seconds. [Sat Apr 6 17:44:52 2024] Not tainted 5.14.0-362.8.1.test2.el9_3.x86_64 #1 [Sat Apr 6 17:44:52 2024] "echo 0 > /proc/sys/kernel/ hung_task_timeout_secs" disables this message. [Sat Apr 6 17:44:52 2024] task:nfsd state:D stack:0 pid: 52502 ppid:2 flags:0x4000 [Sat Apr 6 17:44:52 2024] Call Trace: [Sat Apr 6 17:44:52 2024] [Sat Apr 6 17:44:52 2024] __schedule+0x20a/0x550 [Sat Apr 6 17:44:52 2024] schedule+0x2d/0x70 [Sat Apr 6 17:44:52 2024] schedule_timeout+0x11f/0x160 [Sat Apr 6 17:44:52 2024] ? xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop.0+0x9a/ 0xd0 [xfs] [Sat Apr 6 17:44:52 2024] __down_common+0x11f/0x200 [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop. 0+0x30/0xd0 [xfs] [Sat Apr 6 17:44:52 2024] down+0x43/0x60 [Sat Apr 6 17:44:52 2024] xfs_buf_lock+0x2d/0xe0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_find_lock+0x45/0xf0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_get_map+0xc1/0x3a0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_read_map+0x54/0x290 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_lookup+0x173/0x1d0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget_cache_miss+0xa2/0x370 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget+0x19f/0x270 [xfs] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [Sat Apr 6 17:44:52 2024] exportfs_decode_fh_raw+0x60/0x2e0 [Sat Apr 6 17:44:52 2024] ? exp_find_key+0x99/0x1e0 [nfsd] [Sat Apr 6 17:44:52 2024] ? rcu_nocb_try_bypass+0x4d/0x440 [Sat Apr 6 17:44:52 2024] ? __kmalloc+0x19b/0x370 [Sat Apr 6 17:44:52 2024] ? __pfx_put_cred_rcu+0x10/0x10 [Sat Apr 6 17:44:52 2024] ? call_rcu+0x114/0x310 [Sat Apr 6 17:44:52 2024] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [Sat Apr 6 17:44:52 2024] fh_verify+0x1b3/0x2f0 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_putfh+0x3e/0x70 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_proc_compound+0x44e/0x700 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd_dispatch+0x53/0x170 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process_common+0x357/0x640 [sunrpc] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process+0x12d/0x180 [sunrpc] [Sat Apr 6 17:44:52 2024] nfsd+0xd5/0x190 [nfsd] [Sat Apr 6 17:44:52 2024] kthread+0xe0/0x100 [Sat Apr 6 17:44:52 2024] ? __pfx_kthread+0x10/0x10 [Sat Apr 6 17:44:52 2024] ret_from_fork+0x2c/0x50 [Sat Apr 6 17:44:52 2024] " Stack: " [root@cogsworth ~]# cat /proc/52502/stack [<0>] xfs_buf_lock+0x2d/0xe0 [xfs] [<0>] xfs_buf_find_lock+0x45/0xf0 [xfs] [<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs] [<0>] xfs_buf_read_map+0x54/0x290 [xfs] [<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs] [<0>] xfs_imap_to_bp+0x4e/0x70 [xfs] [<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs] [<0>] xfs_iget+0x19f/0x270 [xfs] [<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [<0>] exportfs_decode_fh_raw+0x60/0x2e0 [<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [<0>] fh_verify+0x1b3/0x2f0 [nfsd] [<0>] nfsd4_putfh+0x3e/0x70 [nfsd] [<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd] [<0>] nfsd_dispatch+0x53/0x170 [nfsd] [<0>] svc_process_common+0x357/0x640 [sunrpc] [<0>] svc_process+0x12d/0x180 [sunrpc] [<0>] nfsd+0xd5/0x190 [nfsd] [<0>] kthread+0xe0/0x100 [<0>] ret_from_fork+0x2c/0x50 " The nfsd treads do not recover even with nobackfill set, so the cluster is essential idle: " [root@lazy ~]# ceph -s cluster: id: X health: HEALTH_ERR nobackfill,noscrub,nodeep-scrub flag(s) set 1 scrub errors Possible data damage: 1 pg inc
[ceph-users] Re: NFS never recovers after slow ops
Hi Torkil, I assume the affected OSDs were the ones with slow requests, no? You should still see them in some of the logs (mon, mgr). Zitat von Torkil Svensgaard : On 06-04-2024 18:10, Torkil Svensgaard wrote: Hi Cephadm Reef 18.2.1 Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host. Even with osd_max_backfills at 1 the OSDs get slow ops from time to time which seems odd as we recently did a huge reshuffle[1] involving the same host without seeing these slow ops. I guess one difference is the disks were then only getting writes when they were added and now they are only being used for reads as they are being drained. The slow ops eventually go away but I'm seeing stuck nfsd threads from RBD exports lingering on forever. I have to reboot the NFS server to get it going again, restarting nfs-server also just hangs. Here's a stack trace from dmesg: " [Sat Apr 6 17:44:52 2024] INFO: task nfsd:52502 blocked for more than 1245 seconds. [Sat Apr 6 17:44:52 2024] Not tainted 5.14.0-362.8.1.test2.el9_3.x86_64 #1 [Sat Apr 6 17:44:52 2024] "echo 0 > /proc/sys/kernel/ hung_task_timeout_secs" disables this message. [Sat Apr 6 17:44:52 2024] task:nfsd state:D stack:0 pid: 52502 ppid:2 flags:0x4000 [Sat Apr 6 17:44:52 2024] Call Trace: [Sat Apr 6 17:44:52 2024] [Sat Apr 6 17:44:52 2024] __schedule+0x20a/0x550 [Sat Apr 6 17:44:52 2024] schedule+0x2d/0x70 [Sat Apr 6 17:44:52 2024] schedule_timeout+0x11f/0x160 [Sat Apr 6 17:44:52 2024] ? xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop.0+0x9a/ 0xd0 [xfs] [Sat Apr 6 17:44:52 2024] __down_common+0x11f/0x200 [Sat Apr 6 17:44:52 2024] ? xfs_btree_read_buf_block.constprop. 0+0x30/0xd0 [xfs] [Sat Apr 6 17:44:52 2024] down+0x43/0x60 [Sat Apr 6 17:44:52 2024] xfs_buf_lock+0x2d/0xe0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_find_lock+0x45/0xf0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_get_map+0xc1/0x3a0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_buf_read_map+0x54/0x290 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_lookup+0x173/0x1d0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_trans_read_buf_map+0x133/0x300 [xfs] [Sat Apr 6 17:44:52 2024] ? xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_imap_to_bp+0x4e/0x70 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget_cache_miss+0xa2/0x370 [xfs] [Sat Apr 6 17:44:52 2024] xfs_iget+0x19f/0x270 [xfs] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [Sat Apr 6 17:44:52 2024] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [Sat Apr 6 17:44:52 2024] exportfs_decode_fh_raw+0x60/0x2e0 [Sat Apr 6 17:44:52 2024] ? exp_find_key+0x99/0x1e0 [nfsd] [Sat Apr 6 17:44:52 2024] ? rcu_nocb_try_bypass+0x4d/0x440 [Sat Apr 6 17:44:52 2024] ? __kmalloc+0x19b/0x370 [Sat Apr 6 17:44:52 2024] ? __pfx_put_cred_rcu+0x10/0x10 [Sat Apr 6 17:44:52 2024] ? call_rcu+0x114/0x310 [Sat Apr 6 17:44:52 2024] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [Sat Apr 6 17:44:52 2024] fh_verify+0x1b3/0x2f0 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_putfh+0x3e/0x70 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd4_proc_compound+0x44e/0x700 [nfsd] [Sat Apr 6 17:44:52 2024] nfsd_dispatch+0x53/0x170 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process_common+0x357/0x640 [sunrpc] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] ? __pfx_nfsd+0x10/0x10 [nfsd] [Sat Apr 6 17:44:52 2024] svc_process+0x12d/0x180 [sunrpc] [Sat Apr 6 17:44:52 2024] nfsd+0xd5/0x190 [nfsd] [Sat Apr 6 17:44:52 2024] kthread+0xe0/0x100 [Sat Apr 6 17:44:52 2024] ? __pfx_kthread+0x10/0x10 [Sat Apr 6 17:44:52 2024] ret_from_fork+0x2c/0x50 [Sat Apr 6 17:44:52 2024] " Stack: " [root@cogsworth ~]# cat /proc/52502/stack [<0>] xfs_buf_lock+0x2d/0xe0 [xfs] [<0>] xfs_buf_find_lock+0x45/0xf0 [xfs] [<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs] [<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs] [<0>] xfs_buf_read_map+0x54/0x290 [xfs] [<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs] [<0>] xfs_imap_to_bp+0x4e/0x70 [xfs] [<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs] [<0>] xfs_iget+0x19f/0x270 [xfs] [<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs] [<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs] [<0>] exportfs_decode_fh_raw+0x60/0x2e0 [<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd] [<0>] fh_verify+0x1b3/0x2f0 [nfsd] [<0>] nfsd4_putfh+0x3e/0x70 [nfsd] [<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd] [<0>] nfsd_dispatch+0x53/0x170 [nfsd] [<0>] svc_process_common+0x357/0x640 [sunrpc] [<0>] svc_process+0x12d/0x180 [sunrpc] [<0>] nfsd+0xd5/0x190 [nfsd] [<0>] kthread+0xe0/0x100 [<0>] ret_from_fork+0x2c/0x50 " The nfsd treads do not recover even with nobackfill set, so the cluster is essential idle: " [root@lazy ~]# ceph -s cluster: id: