On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman <kenneth.waege...@ugent.be> wrote:
> Hi all, > > On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some > snapshots: > > [root@osd001 ~]# ceph -s > cluster: > id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47 > health: HEALTH_WARN > 5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has > slow ops > > services: > mon: 3 daemons, quorum mds01,mds02,mds03 > mgr: mds02(active), standbys: mds03, mds01 > mds: ceph_fs-2/2/2 up {0=mds03=up:active,1=mds01=up:active}, 1 > up:standby > osd: 544 osds: 544 up, 544 in > > io: > client: 5.4 KiB/s wr, 0 op/s rd, 0 op/s wr > > [root@osd001 ~]# ceph health detail > HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has > slow ops > SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow > ops > > [root@osd001 ~]# ceph -v > ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > (stable) > > Is this a known issue? > It's not exactly a known issue, but from the output and story you've got here it looks like the OSDs are deleting the snapshot data too fast and the MDS isn't getting quick enough replies? Or maybe you have an overlarge CephFS directory which is taking a long time to clean up somehow; you should get the MDS ops and the MDS' objecter ops in flight and see what specifically is taking so long. -Greg > > Cheers, > > Kenneth > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com