[ceph-users] Re: Provide more documentation for MDS performance tuning on large file systems

2020-12-05 Thread Dan van der Ster
Hi Janek, My understanding is that the recall thresholds (see my list below) should be scaled proportionally. OTOH, I haven't played with the decay rates (and don't know if there's any significant value to tuning those). We have a recall tuning script that we use to deploy different factors whene

[ceph-users] Re: MDS lost, Filesystem degraded and wont mount

2020-12-05 Thread Dan van der Ster
On Sat, Dec 5, 2020 at 2:41 PM Janek Bevendorff wrote: > > On 05/12/2020 09:26, Dan van der Ster wrote: > > Hi Janek, > > > > I'd love to hear your standard maintenance procedures. Are you > > cleaning up those open files outside of "rejoin" OOMs ? > > No, of course not. But those rejoin problems

[ceph-users] Re: MDS lost, Filesystem degraded and wont mount

2020-12-05 Thread Janek Bevendorff
On 05/12/2020 09:26, Dan van der Ster wrote: Hi Janek, I'd love to hear your standard maintenance procedures. Are you cleaning up those open files outside of "rejoin" OOMs ? No, of course not. But those rejoin problems happen more often than I'd like them to. It has become much better with re

[ceph-users] Re: MDS lost, Filesystem degraded and wont mount

2020-12-05 Thread Dan van der Ster
Hi Janek, I'd love to hear your standard maintenance procedures. Are you cleaning up those open files outside of "rejoin" OOMs ? I guess we're pretty lucky with our CephFS's because we have more than 1k clients and it is pretty solid (though the last upgrade had a hiccup decreasing down to single