[ceph-users] Re: "ceph orch daemon add osd" deploys broken OSD

2024-04-06 Thread service . plant
Hello everyone,
any ideas? Even small hints would help a lot!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state

2024-04-06 Thread service . plant
did it help?  Maybe you found a better solution?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state

2024-04-06 Thread Zakhar Kirpichenko
Well, I've replaced the failed drives and that cleared the error. Arguably,
it was a better solution :-)

/Z

On Sat, 6 Apr 2024 at 10:13,  wrote:

> did it help?  Maybe you found a better solution?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue about execute "ceph fs new"

2024-04-06 Thread elite_stu
Thanks for your information, I tried to new some mds pods, but it seems the 
same issue. 

[root@vm-01 examples]# cat filesystem.yaml | grep activeCount
activeCount: 3
[root@vm-01 examples]#
[root@vm-01 examples]# kubectl get pod -nrook-ceph | grep mds
rook-ceph-mds-myfs-a-6d46fcfd4c-lxc8m 2/2 Running 0 
  11m
rook-ceph-mds-myfs-b-755685bcfb-mnfbv 2/2 Running 0 
  11m
rook-ceph-mds-myfs-c-75c78b68bf-h5m9b 2/2 Running 0 
  9m13s
rook-ceph-mds-myfs-d-6b595c4c98-tq6rl 2/2 Running 0 
  9m12s
rook-ceph-mds-myfs-e-5dbfb9445f-4hbrn 2/2 Running 0 
  117s
rook-ceph-mds-myfs-f-7957c55bc6-xtczr 2/2 Running 0 
  116s
[root@vm-01 examples]#
[root@vm-01 examples]#  kubectl exec -it `kubectl get pod -nrook-ceph | grep 
tools | awk -F ' ' '{print $1}'` -n rook-ceph bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future 
version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds

  services:
mon: 3 daemons, quorum a,b,d (age 74m)
mgr: a(active, since 5d), standbys: b
mds: 3/3 daemons up, 3 hot standby
osd: 3 osds: 3 up (since 84m), 3 in (since 6d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 2/2 healthy
pools:   14 pools, 233 pgs
objects: 633 objects, 450 MiB
usage:   2.0 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

  io:
client:   19 KiB/s rd, 0 B/s wr, 21 op/s rd, 10 op/s wr

bash-4.4$
bash-4.4$
bash-4.4$ ceph health detail
HEALTH_ERR 1 filesystem is offline; 1 filesystem is online with fewer MDS than 
max_mds
[ERR] MDS_ALL_DOWN: 1 filesystem is offline
fs kingcephfs is offline because no MDS is active for it.
[WRN] MDS_UP_LESS_THAN_MAX: 1 filesystem is online with fewer MDS than max_mds
fs kingcephfs has 0 MDS online, but wants 1
bash-4.4$
bash-4.4$
bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue about execute "ceph fs new"

2024-04-06 Thread elite_stu
I tried to remove the default fs then it works, but port 6789 still not able to 
telnet.  

ceph fs fail myfs
ceph fs rm myfs --yes-i-really-mean-it

bash-4.4$
bash-4.4$ ceph fs ls

name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: 
[cephfs-king-data ]
bash-4.4$
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_OK

  services:
mon: 3 daemons, quorum a,b,d (age 90m)
mgr: a(active, since 5d), standbys: b
mds: 1/1 daemons up, 5 standby
osd: 3 osds: 3 up (since 100m), 3 in (since 6d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 233 pgs
objects: 633 objects, 450 MiB
usage:   2.0 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] question regarding access cephFS from external network.

2024-04-06 Thread elite_stu
Dears , 
I have a question about cephfs port 6789 is with type of ClusterIP, then how 
can i access it from external network?


[root@vm-01 examples]# kubectl get svc -nrook-ceph
NAME TYPECLUSTER-IP   
EXTERNAL-IP   PORT(S) AGE
rook-ceph-mgrClusterIP   10.106.49.230
9283/TCP6d23h
rook-ceph-mgr-dashboard  ClusterIP   10.96.37.100 
8443/TCP6d23h
rook-ceph-mgr-dashboard-external-https   NodePort10.108.78.191
8443:31082/TCP  6d21h
rook-ceph-mon-a  ClusterIP   10.107.62.113
6789/TCP,3300/TCP   3h12m
rook-ceph-mon-b  ClusterIP   10.103.94.71 
6789/TCP,3300/TCP   3h12m
rook-ceph-mon-d  ClusterIP   10.110.210.113   
6789/TCP,3300/TCP   3h1m
rook-ceph-rgw-my-store   ClusterIP   10.98.150.97 
80/TCP  4d6h
rook-ceph-rgw-my-store-external  NodePort10.111.249.203   
80:30514/TCP4d5h
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread David C.
Hi,

Do slow ops impact data integrity => No
Can I generally ignore it => No :)

This means that some client transactions are blocked for 120 sec (that's a
lot).
This could be a lock on the client side (CephFS, essentially), an incident
on the infrastructure side (a disk about to fall, network instability,
etc.), ...

When this happens, you need to look at the blocked requests.
If you systematically see an osd ID, then look at dmesg and the SMART of
the disk.

This can also be an architectural problem (for example, high IOPS load with
osdmap on HDD, all multiplied by the erasure code)

*David*


Le ven. 5 avr. 2024 à 19:42, adam.ther  a écrit :

> Hello,
>
> Do slow ops impact data integrity or can I generally ignore it? I'm
> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs.
>
> 2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check
> update: 3 slow ops, oldest one blocked for 117 sec, daemons
> [osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops.
> (SLOW_OPS)
>
> 2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check
> update: 2 slow ops, oldest one blocked for 123 sec, daemons
> [osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS)
>
> I guess more to the point, what the impact here?
>
> Thanks,
>
> Adam
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] NFS never recovers after slow ops

2024-04-06 Thread Torkil Svensgaard

Hi

Cephadm Reef 18.2.1

Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host. Even 
with osd_max_backfills at 1 the OSDs get slow ops from time to time 
which seems odd as we recently did a huge reshuffle[1] involving the 
same host without seeing these slow ops.


I guess one difference is the disks were then only getting writes when 
they were added and now they are only being used for reads as they are 
being drained.


The slow ops eventually go away but I'm seeing stuck nfsd threads from 
RBD exports lingering on forever. I have to reboot the NFS server to get 
it going again, restarting nfs-server also just hangs.


Here's a stack trace from dmesg:

"
[Sat Apr  6 17:44:52 2024] INFO: task nfsd:52502 blocked for more than 
1245 seconds.
[Sat Apr  6 17:44:52 2024]   Not tainted 
5.14.0-362.8.1.test2.el9_3.x86_64 #1
[Sat Apr  6 17:44:52 2024] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Apr  6 17:44:52 2024] task:nfsdstate:D stack:0 
pid:52502 ppid:2  flags:0x4000

[Sat Apr  6 17:44:52 2024] Call Trace:
[Sat Apr  6 17:44:52 2024]  
[Sat Apr  6 17:44:52 2024]  __schedule+0x20a/0x550
[Sat Apr  6 17:44:52 2024]  schedule+0x2d/0x70
[Sat Apr  6 17:44:52 2024]  schedule_timeout+0x11f/0x160
[Sat Apr  6 17:44:52 2024]  ? xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ? 
xfs_btree_read_buf_block.constprop.0+0x9a/0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  __down_common+0x11f/0x200
[Sat Apr  6 17:44:52 2024]  ? 
xfs_btree_read_buf_block.constprop.0+0x30/0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  down+0x43/0x60
[Sat Apr  6 17:44:52 2024]  xfs_buf_lock+0x2d/0xe0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_find_lock+0x45/0xf0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_get_map+0xc1/0x3a0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_read_map+0x54/0x290 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_lookup+0x173/0x1d0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget_cache_miss+0xa2/0x370 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget+0x19f/0x270 [xfs]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[Sat Apr  6 17:44:52 2024]  exportfs_decode_fh_raw+0x60/0x2e0
[Sat Apr  6 17:44:52 2024]  ? exp_find_key+0x99/0x1e0 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? rcu_nocb_try_bypass+0x4d/0x440
[Sat Apr  6 17:44:52 2024]  ? __kmalloc+0x19b/0x370
[Sat Apr  6 17:44:52 2024]  ? __pfx_put_cred_rcu+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ? call_rcu+0x114/0x310
[Sat Apr  6 17:44:52 2024]  nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[Sat Apr  6 17:44:52 2024]  fh_verify+0x1b3/0x2f0 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_putfh+0x3e/0x70 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_proc_compound+0x44e/0x700 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd_dispatch+0x53/0x170 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process_common+0x357/0x640 [sunrpc]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process+0x12d/0x180 [sunrpc]
[Sat Apr  6 17:44:52 2024]  nfsd+0xd5/0x190 [nfsd]
[Sat Apr  6 17:44:52 2024]  kthread+0xe0/0x100
[Sat Apr  6 17:44:52 2024]  ? __pfx_kthread+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ret_from_fork+0x2c/0x50
[Sat Apr  6 17:44:52 2024]  
"

Stack:

"
[root@cogsworth ~]# cat  /proc/52502/stack
[<0>] xfs_buf_lock+0x2d/0xe0 [xfs]
[<0>] xfs_buf_find_lock+0x45/0xf0 [xfs]
[<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs]
[<0>] xfs_buf_read_map+0x54/0x290 [xfs]
[<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs]
[<0>] xfs_imap_to_bp+0x4e/0x70 [xfs]
[<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs]
[<0>] xfs_iget+0x19f/0x270 [xfs]
[<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[<0>] exportfs_decode_fh_raw+0x60/0x2e0
[<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[<0>] fh_verify+0x1b3/0x2f0 [nfsd]
[<0>] nfsd4_putfh+0x3e/0x70 [nfsd]
[<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd]
[<0>] nfsd_dispatch+0x53/0x170 [nfsd]
[<0>] svc_process_common+0x357/0x640 [sunrpc]
[<0>] svc_process+0x12d/0x180 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe0/0x100
[<0>] ret_from_fork+0x2c/0x50
"

The nfsd treads do not recover even with nobackfill set, so the cluster 
is essential idle:


"
[root@lazy ~]# ceph -s
  cluster:
id: X
health: HEALTH_ERR
nobackfill,noscrub,nodeep-scrub flag(s) set
1 scrub errors
Possible data damage: 1 pg inconsistent
631 pgs not deep-scrubbed in time


[ceph-users] Re: Issue about execute "ceph fs new"

2024-04-06 Thread Eugen Block
Did you enable multi-active MDS? Can you please share 'ceph fs dump'?  
Port 6789 is the MON port (v1, v2 is 3300). If you haven't enabled  
multi-active, run:


ceph fs flag set enable_multiple

Zitat von elite_...@163.com:

I tried to remove the default fs then it works, but port 6789 still  
not able to telnet.


ceph fs fail myfs
ceph fs rm myfs --yes-i-really-mean-it

bash-4.4$
bash-4.4$ ceph fs ls

name: kingcephfs, metadata pool: cephfs-king-metadata, data pools:  
[cephfs-king-data ]

bash-4.4$
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_OK

  services:
mon: 3 daemons, quorum a,b,d (age 90m)
mgr: a(active, since 5d), standbys: b
mds: 1/1 daemons up, 5 standby
osd: 3 osds: 3 up (since 100m), 3 in (since 6d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 233 pgs
objects: 633 objects, 450 MiB
usage:   2.0 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue about execute "ceph fs new"

2024-04-06 Thread Eugen Block

Sorry, I hit send too early, to enable multi-active MDS the full command is:

ceph fs flag set enable_multiple true

Zitat von Eugen Block :

Did you enable multi-active MDS? Can you please share 'ceph fs  
dump'? Port 6789 is the MON port (v1, v2 is 3300). If you haven't  
enabled multi-active, run:


ceph fs flag set enable_multiple

Zitat von elite_...@163.com:

I tried to remove the default fs then it works, but port 6789 still  
not able to telnet.


ceph fs fail myfs
ceph fs rm myfs --yes-i-really-mean-it

bash-4.4$
bash-4.4$ ceph fs ls

name: kingcephfs, metadata pool: cephfs-king-metadata, data pools:  
[cephfs-king-data ]

bash-4.4$
bash-4.4$
bash-4.4$ ceph -s
 cluster:
   id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
   health: HEALTH_OK

 services:
   mon: 3 daemons, quorum a,b,d (age 90m)
   mgr: a(active, since 5d), standbys: b
   mds: 1/1 daemons up, 5 standby
   osd: 3 osds: 3 up (since 100m), 3 in (since 6d)
   rgw: 1 daemon active (1 hosts, 1 zones)

 data:
   volumes: 1/1 healthy
   pools:   14 pools, 233 pgs
   objects: 633 objects, 450 MiB
   usage:   2.0 GiB used, 208 GiB / 210 GiB avail
   pgs: 233 active+clean

bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread Anthony D'Atri
ISTR that the Ceph slow op threshold defaults to 30 or 32 seconds.   Naturally 
an op over the threshold often means there are more below the reporting 
threshold.  

120s I think is the default Linux op timeout.  

> On Apr 6, 2024, at 10:53 AM, David C.  wrote:
> 
> Hi,
> 
> Do slow ops impact data integrity => No
> Can I generally ignore it => No :)
> 
> This means that some client transactions are blocked for 120 sec (that's a
> lot).
> This could be a lock on the client side (CephFS, essentially), an incident
> on the infrastructure side (a disk about to fall, network instability,
> etc.), ...
> 
> When this happens, you need to look at the blocked requests.
> If you systematically see an osd ID, then look at dmesg and the SMART of
> the disk.
> 
> This can also be an architectural problem (for example, high IOPS load with
> osdmap on HDD, all multiplied by the erasure code)
> 
> *David*
> 
> 
>> Le ven. 5 avr. 2024 à 19:42, adam.ther  a écrit :
>> 
>> Hello,
>> 
>> Do slow ops impact data integrity or can I generally ignore it? I'm
>> loading 3 hosts with a 10GB link and it saturating the disks or the OSDs.
>> 
>>2024-04-05T15:33:10.625922+ mon.CEPHADM-1 [WRN] Health check
>>update: 3 slow ops, oldest one blocked for 117 sec, daemons
>>[osd.0,osd.13,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops.
>> (SLOW_OPS)
>> 
>>2024-04-05T15:33:15.628271+ mon.CEPHADM-1 [WRN] Health check
>>update: 2 slow ops, oldest one blocked for 123 sec, daemons
>>[osd.0,osd.1,osd.14,osd.17,osd.3,osd.4,osd.9] have slow ops. (SLOW_OPS)
>> 
>> I guess more to the point, what the impact here?
>> 
>> Thanks,
>> 
>> Adam
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NFS never recovers after slow ops

2024-04-06 Thread Torkil Svensgaard

On 06-04-2024 18:10, Torkil Svensgaard wrote:

Hi

Cephadm Reef 18.2.1

Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host. Even 
with osd_max_backfills at 1 the OSDs get slow ops from time to time 
which seems odd as we recently did a huge reshuffle[1] involving the 
same host without seeing these slow ops.


I guess one difference is the disks were then only getting writes when 
they were added and now they are only being used for reads as they are 
being drained.


The slow ops eventually go away but I'm seeing stuck nfsd threads from 
RBD exports lingering on forever. I have to reboot the NFS server to get 
it going again, restarting nfs-server also just hangs.


Here's a stack trace from dmesg:

"
[Sat Apr  6 17:44:52 2024] INFO: task nfsd:52502 blocked for more than 
1245 seconds.
[Sat Apr  6 17:44:52 2024]   Not tainted 
5.14.0-362.8.1.test2.el9_3.x86_64 #1
[Sat Apr  6 17:44:52 2024] "echo 0 > /proc/sys/kernel/ 
hung_task_timeout_secs" disables this message.
[Sat Apr  6 17:44:52 2024] task:nfsd    state:D stack:0 pid: 
52502 ppid:2  flags:0x4000

[Sat Apr  6 17:44:52 2024] Call Trace:
[Sat Apr  6 17:44:52 2024]  
[Sat Apr  6 17:44:52 2024]  __schedule+0x20a/0x550
[Sat Apr  6 17:44:52 2024]  schedule+0x2d/0x70
[Sat Apr  6 17:44:52 2024]  schedule_timeout+0x11f/0x160
[Sat Apr  6 17:44:52 2024]  ? xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_btree_read_buf_block.constprop.0+0x9a/ 
0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  __down_common+0x11f/0x200
[Sat Apr  6 17:44:52 2024]  ? xfs_btree_read_buf_block.constprop. 
0+0x30/0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  down+0x43/0x60
[Sat Apr  6 17:44:52 2024]  xfs_buf_lock+0x2d/0xe0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_find_lock+0x45/0xf0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_get_map+0xc1/0x3a0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_read_map+0x54/0x290 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_lookup+0x173/0x1d0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget_cache_miss+0xa2/0x370 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget+0x19f/0x270 [xfs]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[Sat Apr  6 17:44:52 2024]  exportfs_decode_fh_raw+0x60/0x2e0
[Sat Apr  6 17:44:52 2024]  ? exp_find_key+0x99/0x1e0 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? rcu_nocb_try_bypass+0x4d/0x440
[Sat Apr  6 17:44:52 2024]  ? __kmalloc+0x19b/0x370
[Sat Apr  6 17:44:52 2024]  ? __pfx_put_cred_rcu+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ? call_rcu+0x114/0x310
[Sat Apr  6 17:44:52 2024]  nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[Sat Apr  6 17:44:52 2024]  fh_verify+0x1b3/0x2f0 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_putfh+0x3e/0x70 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_proc_compound+0x44e/0x700 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd_dispatch+0x53/0x170 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process_common+0x357/0x640 [sunrpc]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process+0x12d/0x180 [sunrpc]
[Sat Apr  6 17:44:52 2024]  nfsd+0xd5/0x190 [nfsd]
[Sat Apr  6 17:44:52 2024]  kthread+0xe0/0x100
[Sat Apr  6 17:44:52 2024]  ? __pfx_kthread+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ret_from_fork+0x2c/0x50
[Sat Apr  6 17:44:52 2024]  
"

Stack:

"
[root@cogsworth ~]# cat  /proc/52502/stack
[<0>] xfs_buf_lock+0x2d/0xe0 [xfs]
[<0>] xfs_buf_find_lock+0x45/0xf0 [xfs]
[<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs]
[<0>] xfs_buf_read_map+0x54/0x290 [xfs]
[<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs]
[<0>] xfs_imap_to_bp+0x4e/0x70 [xfs]
[<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs]
[<0>] xfs_iget+0x19f/0x270 [xfs]
[<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[<0>] exportfs_decode_fh_raw+0x60/0x2e0
[<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[<0>] fh_verify+0x1b3/0x2f0 [nfsd]
[<0>] nfsd4_putfh+0x3e/0x70 [nfsd]
[<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd]
[<0>] nfsd_dispatch+0x53/0x170 [nfsd]
[<0>] svc_process_common+0x357/0x640 [sunrpc]
[<0>] svc_process+0x12d/0x180 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe0/0x100
[<0>] ret_from_fork+0x2c/0x50
"

The nfsd treads do not recover even with nobackfill set, so the cluster 
is essential idle:


"
[root@lazy ~]# ceph -s
   cluster:
     id: X
     health: HEALTH_ERR
     nobackfill,noscrub,nodeep-scrub flag(s) set
     1 scrub errors
     Possible data damage: 1 pg inc

[ceph-users] Re: NFS never recovers after slow ops

2024-04-06 Thread Eugen Block

Hi Torkil,

I assume the affected OSDs were the ones with slow requests, no? You  
should still see them in some of the logs (mon, mgr).


Zitat von Torkil Svensgaard :


On 06-04-2024 18:10, Torkil Svensgaard wrote:

Hi

Cephadm Reef 18.2.1

Started draining 5 18-20 TB HDD OSDs (DB/WAL om NVMe) on one host.  
Even with osd_max_backfills at 1 the OSDs get slow ops from time to  
time which seems odd as we recently did a huge reshuffle[1]  
involving the same host without seeing these slow ops.


I guess one difference is the disks were then only getting writes  
when they were added and now they are only being used for reads as  
they are being drained.


The slow ops eventually go away but I'm seeing stuck nfsd threads  
from RBD exports lingering on forever. I have to reboot the NFS  
server to get it going again, restarting nfs-server also just hangs.


Here's a stack trace from dmesg:

"
[Sat Apr  6 17:44:52 2024] INFO: task nfsd:52502 blocked for more  
than 1245 seconds.
[Sat Apr  6 17:44:52 2024]   Not tainted  
5.14.0-362.8.1.test2.el9_3.x86_64 #1
[Sat Apr  6 17:44:52 2024] "echo 0 > /proc/sys/kernel/  
hung_task_timeout_secs" disables this message.
[Sat Apr  6 17:44:52 2024] task:nfsd    state:D stack:0  
pid: 52502 ppid:2  flags:0x4000

[Sat Apr  6 17:44:52 2024] Call Trace:
[Sat Apr  6 17:44:52 2024]  
[Sat Apr  6 17:44:52 2024]  __schedule+0x20a/0x550
[Sat Apr  6 17:44:52 2024]  schedule+0x2d/0x70
[Sat Apr  6 17:44:52 2024]  schedule_timeout+0x11f/0x160
[Sat Apr  6 17:44:52 2024]  ? xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ?  
xfs_btree_read_buf_block.constprop.0+0x9a/ 0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  __down_common+0x11f/0x200
[Sat Apr  6 17:44:52 2024]  ? xfs_btree_read_buf_block.constprop.  
0+0x30/0xd0 [xfs]

[Sat Apr  6 17:44:52 2024]  down+0x43/0x60
[Sat Apr  6 17:44:52 2024]  xfs_buf_lock+0x2d/0xe0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_find_lock+0x45/0xf0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_get_map+0xc1/0x3a0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_buf_read_map+0x54/0x290 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_lookup+0x173/0x1d0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_trans_read_buf_map+0x133/0x300 [xfs]
[Sat Apr  6 17:44:52 2024]  ? xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_imap_to_bp+0x4e/0x70 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget_cache_miss+0xa2/0x370 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_iget+0x19f/0x270 [xfs]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_acceptable+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[Sat Apr  6 17:44:52 2024]  xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[Sat Apr  6 17:44:52 2024]  exportfs_decode_fh_raw+0x60/0x2e0
[Sat Apr  6 17:44:52 2024]  ? exp_find_key+0x99/0x1e0 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? rcu_nocb_try_bypass+0x4d/0x440
[Sat Apr  6 17:44:52 2024]  ? __kmalloc+0x19b/0x370
[Sat Apr  6 17:44:52 2024]  ? __pfx_put_cred_rcu+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ? call_rcu+0x114/0x310
[Sat Apr  6 17:44:52 2024]  nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[Sat Apr  6 17:44:52 2024]  fh_verify+0x1b3/0x2f0 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_putfh+0x3e/0x70 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd4_proc_compound+0x44e/0x700 [nfsd]
[Sat Apr  6 17:44:52 2024]  nfsd_dispatch+0x53/0x170 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process_common+0x357/0x640 [sunrpc]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Sat Apr  6 17:44:52 2024]  svc_process+0x12d/0x180 [sunrpc]
[Sat Apr  6 17:44:52 2024]  nfsd+0xd5/0x190 [nfsd]
[Sat Apr  6 17:44:52 2024]  kthread+0xe0/0x100
[Sat Apr  6 17:44:52 2024]  ? __pfx_kthread+0x10/0x10
[Sat Apr  6 17:44:52 2024]  ret_from_fork+0x2c/0x50
[Sat Apr  6 17:44:52 2024]  
"

Stack:

"
[root@cogsworth ~]# cat  /proc/52502/stack
[<0>] xfs_buf_lock+0x2d/0xe0 [xfs]
[<0>] xfs_buf_find_lock+0x45/0xf0 [xfs]
[<0>] xfs_buf_lookup.constprop.0+0xe4/0x170 [xfs]
[<0>] xfs_buf_get_map+0xc1/0x3a0 [xfs]
[<0>] xfs_buf_read_map+0x54/0x290 [xfs]
[<0>] xfs_trans_read_buf_map+0x133/0x300 [xfs]
[<0>] xfs_imap_to_bp+0x4e/0x70 [xfs]
[<0>] xfs_iget_cache_miss+0xa2/0x370 [xfs]
[<0>] xfs_iget+0x19f/0x270 [xfs]
[<0>] xfs_nfs_get_inode.isra.0+0x5e/0xa0 [xfs]
[<0>] xfs_fs_fh_to_dentry+0x48/0xb0 [xfs]
[<0>] exportfs_decode_fh_raw+0x60/0x2e0
[<0>] nfsd_set_fh_dentry+0x2b9/0x470 [nfsd]
[<0>] fh_verify+0x1b3/0x2f0 [nfsd]
[<0>] nfsd4_putfh+0x3e/0x70 [nfsd]
[<0>] nfsd4_proc_compound+0x44e/0x700 [nfsd]
[<0>] nfsd_dispatch+0x53/0x170 [nfsd]
[<0>] svc_process_common+0x357/0x640 [sunrpc]
[<0>] svc_process+0x12d/0x180 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe0/0x100
[<0>] ret_from_fork+0x2c/0x50
"

The nfsd treads do not recover even with nobackfill set, so the  
cluster is essential idle:


"
[root@lazy ~]# ceph -s
  cluster:
    id: