[ceph-users] Cephfs client deadlock (OSD op state mismatch after stuck ops?)

Hector Martin Sun, 20 Jul 2025 20:23:19 -0700

Hi all,

I'm running a small converged Ceph cluster (mons, OSDs, mds on the same
3 hosts, one of which is also a CephFS client). I noticed that sometimes
the CephFS client seems to deadlock. There are other CephFS clients on
the network, but they generally don't collide in file/dir accesses much
with the local client.


I found this old thread that describes a similar situation:

https://www.mail-archive.com/search?l=ceph-us...@lists.ceph.com&q=subject:%22Re%5C%3A+%5C%5Bceph%5C-users%5C%5D+co%5C-located+cephfs+client+deadlock%22&o=newest&f=1

The kernel version is 6.14.6 for the main hosts, 6.14.8 for a few other
unrelated clients, and 6.3.2 for one older client. Ceph version is
19.2.2. Ceph status was clean and dmesg didn't show any warnings around
the time of the hang, other than a few of these (which I always get and
seem to be related to SELinux attributes, but appear to be harmless):

[3447915.841214] ceph: [667e28f4-8964-454b-b777-03ce12b67742 6505232]:
sync 00000000f6e49a9a 100033726fb.152 during filling trace
[3447915.851171] SELinux: inode_doinit_use_xattr:  getxattr returned 16
for dev=ceph ino=1099565573883

Restarting the mds (letting it failover to the standby-replay mds) did
not help the situation, but after finding the above thread I noticed
that all the requests were stuck on the same OSD:

> root@flamingo:~ -# cat /sys/kernel/debug/ceph/*/osdc
> REQUESTS 17 homeless 0
> 46841260        osd17   14.2ab058d9     14.19s0 [17,18,15,6,7,4,3,0,1]/17     
>   [17,18,15,6,7,4,3,0,1]/17       e25593  10002e55314.0000003c    0x400014    
>     1       read
> 46841268        osd17   14.571aa108     14.8s0  [17,16,15,0,1,2,4,5,7]/17     
>   [17,16,15,0,1,2,4,5,7]/17       e25593  100030edef7.00000039    0x400014    
>     1       read
> 46841311        osd17   14.7b4e2d92     14.12s0 [17,16,15,6,7,4,2,1,3]/17     
>   [17,16,15,6,7,4,2,1,3]/17       e25593  1000318f566.00000179    0x400014    
>     1       read
> 46841323        osd17   14.deec222d     14.2ds0 [17,16,18,4,7,5,2,1,0]/17     
>   [17,16,18,4,7,5,2,1,0]/17       e25593  100030edf00.00000028    0x400014    
>     1       read
> 46841334        osd17   14.eb940708     14.8s0  [17,16,15,0,1,2,4,5,7]/17     
>   [17,16,15,0,1,2,4,5,7]/17       e25593  10003042460.000001c8    0x400014    
>     1       read
> 46841336        osd17   14.93065908     14.8s0  [17,16,15,0,1,2,4,5,7]/17     
>   [17,16,15,0,1,2,4,5,7]/17       e25593  1000318f566.00000181    0x400014    
>     1       read
> 46841350        osd17   14.3e9eeaa5     14.25s0 [17,18,15,2,0,1,7,6,4]/17     
>   [17,18,15,2,0,1,7,6,4]/17       e25593  10003498f94.00000000    0x400024    
>     1       write
> 46841359        osd17   14.5120499      14.19s0 [17,18,15,6,7,4,3,0,1]/17     
>   [17,18,15,6,7,4,3,0,1]/17       e25593  10002e53590.00000000    0x400014    
>     1       read
> 46841367        osd17   14.25e874a5     14.25s0 [17,18,15,2,0,1,7,6,4]/17     
>   [17,18,15,2,0,1,7,6,4]/17       e25593  1000318f566.00000196    0x400014    
>     1       read
> 46841376        osd17   14.cd26e465     14.25s0 [17,18,15,2,0,1,7,6,4]/17     
>   [17,18,15,2,0,1,7,6,4]/17       e25593  1000318f566.00000199    0x400014    
>     1       read
> 46841405        osd17   14.855295f5     14.35s0 [17,18,15,3,0,2,4,7,5]/17     
>   [17,18,15,3,0,2,4,7,5]/17       e25593  1000318f566.000001a7    0x400014    
>     1       read
> 46841421        osd17   14.57d5ef2c     14.2cs0 [17,18,16,1,3,2,4,6,5]/17     
>   [17,18,16,1,3,2,4,6,5]/17       e25593  10002fe07b1.00000008    0x400014    
>     1       read
> 46841424        osd17   14.ea554059     14.19s0 [17,18,15,6,7,4,3,0,1]/17     
>   [17,18,15,6,7,4,3,0,1]/17       e25593  1000318f566.000001ac    0x400014    
>     1       read
> 46841429        osd17   14.fa91b525     14.25s0 [17,18,15,2,0,1,7,6,4]/17     
>   [17,18,15,2,0,1,7,6,4]/17       e25593  100030916b5.0000004f    0x400014    
>     1       read
> 46841447        osd17   14.88f24788     14.8s0  [17,16,15,0,1,2,4,5,7]/17     
>   [17,16,15,0,1,2,4,5,7]/17       e25593  100032d95c7.00000011    0x400014    
>     1       read
> 46841468        osd17   12.159b3dd2     12.12   [17,5,2]/17     [17,5,2]/17   
>   e25593  10003307266.00000013    0x400014        1       read
> 46841496        osd17   12.a08092d2     12.12   [17,5,2]/17     [17,5,2]/17   
>   e25593  10003493c3f.00000050    0x400014        1       read
> LINGER REQUESTS
> BACKOFFS

Restarting that OSD fixed it. Looking at the OSD log at the exact time
of the hang (determined by other logs), I see this:

> 2025-07-18T06:39:06.992+0900 fffeed21bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.162732124s, lat = 5s cid =12.3_head oid 
> =#12:c0ccf68d:::1000336b41c.0000009d:13d#
> 2025-07-18T06:39:06.992+0900 fffeed21bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.167396545s
> 2025-07-18T06:39:06.993+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.168415070s, txc = 0xaaabe3f57000
> 2025-07-18T06:39:14.834+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.340788364s, lat = 5s cid =12.14_head oid 
> =#12:28e0c837:::10003307273.00000024:13d#
> 2025-07-18T06:39:14.835+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.342024803s
> 2025-07-18T06:39:14.843+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.349779129s, txc = 0xaaac19c75b00
> 2025-07-18T06:40:37.552+0900 fffeeb9ebf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.124079704s, lat = 5s cid =12.c_head oid 
> =#12:340862a9:::100033073f1.0000002c:13d#
> 2025-07-18T06:40:37.560+0900 fffeeb9ebf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.132128239s
> 2025-07-18T06:40:37.568+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.140548706s, txc = 0xaaabf25b8000
> 2025-07-18T06:40:59.228+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.677710056s, lat = 5s cid =12.2_head oid 
> =#12:40b82ac1:::1000348d274.00000008:13d#
> 2025-07-18T06:40:59.229+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.679266453s
> 2025-07-18T06:40:59.232+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.682381630s, txc = 0xaaac34479500
> 2025-07-18T06:41:01.799+0900 fffeeb9ebf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.117489815s, lat = 5s cid =12.c_head oid 
> =#12:340e7251:::10003307273.00000027:13d#
> 2025-07-18T06:41:01.799+0900 fffeeb9ebf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.118617058s
> 2025-07-18T06:41:01.804+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.123104095s, txc = 0xaaabbf99de00
> 2025-07-18T06:42:28.863+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.191506386s, lat = 5s cid =12.3_head oid 
> =#12:c2349a5a:::1000336b418.000000f0:13d#
> 2025-07-18T06:42:28.863+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.192711353s
> 2025-07-18T06:42:28.864+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.193384171s, txc = 0xaaabd1e51e00
> 2025-07-18T06:42:34.152+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.287574291s, lat = 5s cid =12.3_head oid 
> =#12:c2bf668a:::10003307273.0000007b:13d#
> 2025-07-18T06:42:34.154+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.289901257s
> 2025-07-18T06:42:34.156+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.291684628s, txc = 0xaaabf9f90c00
> 2025-07-18T06:42:40.801+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:42:40.801+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
[ repeats ... ]
> 2025-07-18T06:44:15.954+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:15.954+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:44:27.591+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.561402321s, lat = 5s cid =12.0_head oid 
> =#12:064b3406:::1000348bd03.0000004a:13d#
> 2025-07-18T06:44:27.594+0900 fffeeca0bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.565083027s
> 2025-07-18T06:44:27.595+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.565894604s, txc = 0xaaabe0e98c00
> 2025-07-18T06:44:29.442+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:29.442+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
[ repeats ... ]
> 2025-07-18T06:44:32.362+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:32.362+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 1 'started' : 1 ] most affected 
> pool [ 'cephfs2_data_hr3' : 1 ])
> 2025-07-18T06:44:33.314+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:33.314+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 1 'started' : 1 ] most affected 
> pool [ 'cephfs2_data_hr3' : 1 ])
> 2025-07-18T06:44:34.328+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:34.328+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 1 'started' : 1 ] most affected 
> pool [ 'cephfs2_data_hr3' : 1 ])
> 2025-07-18T06:44:38.361+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841376 14.25s0 
> 14.cd26e465 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:38.361+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 2 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 2 ])
> 2025-07-18T06:44:39.369+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841376 14.25s0 
> 14.cd26e465 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:39.369+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 2 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 2 ])
> 2025-07-18T06:44:40.356+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 5 slow ops, oldest is osd_op(client.6505232.0:46841260 14.19s0 
> 14.2ab058d9 (undecoded) ondisk+read+known_if_redirected e25583)
> 2025-07-18T06:44:40.356+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 5 slow requests (by type [ 'queued for pg' : 5 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 5 ])
> 2025-07-18T06:44:41.324+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:44:41.324+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
[ repeats ... ]
> 2025-07-18T06:45:14.197+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:14.197+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:45:15.825+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.245476246s, lat = 5s cid =12.b_head oid 
> =#12:d12c2386:::1000336b41a.00000095:13d#
> 2025-07-18T06:45:15.826+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.247073650s
> 2025-07-18T06:45:15.831+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.252201080s, txc = 0xaaabd1e50c00
> 2025-07-18T06:45:19.037+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:19.037+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:45:20.056+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:20.056+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:45:21.010+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:21.010+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:45:21.963+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:21.963+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:45:22.973+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841376 14.25s0 
> 14.cd26e465 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:22.973+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 2 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 2 ])
> 2025-07-18T06:45:23.958+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 2 slow ops, oldest is osd_op(client.6505232.0:46841376 14.25s0 
> 14.cd26e465 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:45:23.958+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 2 slow requests (by type [ 'queued for pg' : 2 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 2 ])
> 2025-07-18T06:45:32.861+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
[ repeats ... ]
> 2025-07-18T06:49:20.553+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:49:21.547+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:49:21.547+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:49:22.596+0900 ffff014cbf00 -1 osd.17 25584 get_health_metrics 
> reporting 1 slow ops, oldest is osd_op(client.6505232.0:46841429 14.25s0 
> 14.fa91b525 (undecoded) ondisk+read+known_if_redirected e25584)
> 2025-07-18T06:49:22.596+0900 ffff014cbf00  0 log_channel(cluster) log [WRN] : 
> 1 slow requests (by type [ 'queued for pg' : 1 ] most affected pool [ 
> 'cephfs2_data_hec5.4' : 1 ])
> 2025-07-18T06:49:48.204+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _remove, latency = 5.027801514s, lat = 5s cid =12.1a_head oid 
> =#12:5b980d7b:::100033073d8.0000002a:13d#
> 2025-07-18T06:49:48.205+0900 fffeec1fbf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency slow operation observed for 
> submit_transact, latency = 5.031510353s
> 2025-07-18T06:49:48.214+0900 fffef837bf00  0 
> bluestore(/var/lib/ceph/osd/ceph-17) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.040008068s, txc = 0xaaabe5c0cf00

Note that osd17 is on a different host (but some of the shards of the
affected PGs are on the same host).

After this 10-minute period, the logs are clean. So it kind of looks
like one op (46841429) above was a slow op for about 10 minutes, but
then it stopped being considered slow by the OSD, yet the kernel client
never considered the op completed and it got stuck there?

Other ops from that client are also mentioned, and this doesn't seem
right to me. Op 46841429 is the first one to become "slow", but then
other ops like 46841376 are reported as the "oldest" op, and then it
goes back to 46841429. Unless this is about different phases for the
same op, it doesn't make sense to me that a younger op could be reported
as slow, then an older op, then back to a younger op. If the older op is
older, it should always be reported as slow first, right?

I checked all the other OSD logs and there's nothing interesting other
than a couple other OSDs that also got slow ops at around the same time
(but much shorter duration).

Grafana stats show a snaptrim operation started at exactly that time and
lasted 10 minutes. The CPU logs show that this did repro the CPU
usage/page cache thrashing anomaly I previously mentioned on the list
(high sys >> user CPU usage), but this time the thrashing only lasted 10
minutes, so I would say this is unlikely to be directly caused by that
bug, though it could be that the high load triggering slow ops in
general then tickled another bug causing the ops to get stuck.

Happy to help debug this if it happens again if anyone has pointers (I
think this might not be the first time, but it's the first time I found
that thread and learned about this issue specifically, and the debug
file to check the culprit OSD).

- Hector
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cephfs client deadlock (OSD op state mismatch after stuck ops?)

Reply via email to