Hi Eugen,
thanks for the response! :-)
We have (kind of) solved the problem immediately at hand. The whole process
was stuck because the MDSes were actually getting 'killed'. In fact, the
amount of RAM we allocated to the MDSes was insufficient to accommodate the
logs' complete replay. Therefore,
Hi,
sorry for not responing earlier.
Pardon my ignorance, I'm not quite sure I know what you mean by subtree
pinning. I quickly googled it and saw it was a new feature in Luminous. We
are running Pacific. I would assume this feature was not out yet.
Luminous is older than Pacific, so the feat
Hi Emmanuel,
regarding stopping state. We had a similar issue. see subject: MDS Upgrade from
17.2.5 to 17.2.6 not possible
We solved this by failing the MDS, which was in the stop state, but I don't
know if that's a good idea in general.
What does the log of the mds (stopping) shows? We obser
Hi Eugen,
Also, do you know why you use a multi-active MDS setup?
To be completely candid, I don't really know why this choice was made. I
assume the goal was to provide fault-tolerance and load-balancing.
Was that a requirement for subtree pinning (otherwise multiple active
daemons would balance
Hi Wes,
thanks for the heads-up.
Best,
Emmanuel
On Wed, May 24, 2023 at 5:47 PM Wesley Dillingham
wrote:
> There was a memory issue with standby-replay that may have been resolved
> since and fix is in 16.2.10 (not sure), the suggestion at the time was to
> avoid standby-replay.
>
> Perhaps a
There was a memory issue with standby-replay that may have been resolved
since and fix is in 16.2.10 (not sure), the suggestion at the time was to
avoid standby-replay.
Perhaps a dev can chime in on that status. Your MDSs look pretty inactive.
I would consider scaling them down (potentially to sin
Hi,
using standby-replay daemons is something to test as it can have a
negative impact, it really depends on the actual workload. We stopped
using standby-replay in all clusters we (help) maintain, in one
specific case with many active MDSs and a high load the failover time
decreased and
So I guess, I'll end up doing:
ceph fs set cephfs max_mds 4
ceph fs set cephfs allow_standby_replay true
On Wed, May 24, 2023 at 4:13 PM Hector Martin wrote:
> Hi,
>
> On 24/05/2023 22.02, Emmanuel Jaep wrote:
> > Hi Hector,
> >
> > thank you very much for the detailed explanation and link to th
Hi,
On 24/05/2023 22.02, Emmanuel Jaep wrote:
> Hi Hector,
>
> thank you very much for the detailed explanation and link to the
> documentation.
>
> Given our current situation (7 active MDSs and 1 standby MDS):
> RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS
> 0active
Hi Hector,
thank you very much for the detailed explanation and link to the
documentation.
Given our current situation (7 active MDSs and 1 standby MDS):
RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS
0active icadmin012 Reqs: 82 /s 2345k 2288k 97.2k 307k
1a
On 24/05/2023 21.15, Emmanuel Jaep wrote:
> Hi,
>
> we are currently running a ceph fs cluster at the following version:
> MDS version: ceph version 16.2.10
> (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
>
> The cluster is composed of 7 active MDSs and 1 standby MDS:
> RANK STATE
On 24/05/2023 21.15, Emmanuel Jaep wrote:
> Hi,
>
> we are currently running a ceph fs cluster at the following version:
> MDS version: ceph version 16.2.10
> (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
>
> The cluster is composed of 7 active MDSs and 1 standby MDS:
> RANK STATE
12 matches
Mail list logo