Hi,
have you resolved this issue in the meantime? If not, what is your
mds_cache_memory_limit? Increasing that and maybe mds_log_max_segments
could help with that.
Anything in ceph tell mds.{MDS} dump_blocked_ops?
Regards,
Eugen
Zitat von Adam Prycki <apry...@man.poznan.pl>:
Hello,
we are having issues with cephfs cluster.
Any help would be appreciated.
We are running still on 18.2.0.
During holidays we had outage caused by filling up rootfs. OSDs
started randomly dying and we had time when not all PGs were active.
This issue is already solved and all OSDs work fine but we're stuck
with some MDS issues.
warnings we are concerned about:
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.arm-vol.k02r04nvm01.zaqebs(mds.0): 29 slow metadata IOs are
blocked > 30 secs, oldest blocked for 1899 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.arm-vol.k02r04nvm01.zaqebs(mds.0): Behind on trimming
(4851/128) max_segments: 128, num_segments: 4851
1. Out MDSs are not trimming.
2. our active MDS has metadata slow ops which we cannot understand
Cephfs status look ok, main MDS is active.
All metadata pool PGs are active and working, there are not laggy PGs.
Trying to dump ops from mds also doesn't help
ceph daemon ./ceph-mds.arm-vol.k02r04nvm01.zaqebs.asok dump_ops_in_flight
{
"ops": [],
"num_ops": 0
}
MDS failover or MDS restart also doesn't help.
Metadata slow ops always return after MDS restart. (all MDSs have this issue)
After failover main MDS is stuck in rejoin state for a long time.
We've used mds_wipe_sessions config option to bring it quickly into
active state.
I'm guessing slow metadata ops are stopping MDS from trimming but we
cannot figure out what is causing these slow ops.
Best regards
Adam Prycki
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io