Hi Dan,
I opened the ticket last friday:
https://tracker.ceph.com/issues/46978
Manuel
On Fri, 14 Aug 2020 17:49:55 +0200
Dan van der Ster wrote:
> I think the best course of action would be to open a tracker ticket
> with details about your environment and your observations, then the
> dev
I think the best course of action would be to open a tracker ticket
with details about your environment and your observations, then the
devs could try to see if something was overlooked with this change.
-- dan
On Fri, Aug 14, 2020 at 5:48 PM Manuel Lausch wrote:
>
> Hi,
>
> I thought the "fail"
Hi,
I thought the "fail" needs to propagated as well. Am I false?
Who can have a look if a markdown message in "fast shutdown" mode is a
possibilitiy? I do not have the expertice to say if this would breakt
something else. But if this is possible I would vote for this.
Thanks
Manuel
On Fri, 1
Hi,
I suppose the idea is that it's quicker to fail via the connection
refused setting than by waiting for an osdmap to be propagated across
the cluster.
It looks simple enough in OSD.cc to also send the down message to the
mon even with fast shutdown enabled. But I don't have any clue if that
wo
Hi Dan,
thank you for the link. I read it as well as the linked conversation in
the rook project.
I don't get it why the fast shutdown should be better than the "normal"
shutdown in which the OSD annouces its shutdown directly.
Are there cases where the shutdown of the OSD takes longer until its
There's a bit of discussion on this at the original PR:
https://github.com/ceph/ceph/pull/31677
Sage claims the IO interruption should be smaller with
osd_fast_shutdown than without.
-- dan
On Fri, Aug 14, 2020 at 10:08 AM Manuel Lausch wrote:
>
> Hi Dan,
>
> stopping a single OSD took mostly 1
Hi Dan,
stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the
first reporting in ceph.log. Stopping a whole node, in this case 24
OSDs, in the most cases it took 5 to 7 seconds. After the reporting
peering begins, but this is quite fast.
Since I have the fast shutdown disabled. Th
OK I just wanted to confirm you hadn't extended the
osd_heartbeat_grace or similar.
On your large cluster, what is the time from stopping an osd (with
fasst shutdown enabled) to:
cluster [DBG] osd.317 reported immediately failed by osd.202
-- dan
On Thu, Aug 13, 2020 at 4:38 PM Manuel Lausch
Hi Dan,
The only settings in my ceph.conf related to down/out and peering are
this.
mon osd down out interval = 1800
mon osd down out subtree limit = host
mon osd min down reporters = 3
mon osd reporter subtree level = host
The Cluster has 44 Hosts รก 24 OSDs
Manuel
On Thu, 13 Aug 2
Hi Manuel,
Just to clarify -- do you override any of the settings related to peer
down detection? heartbeat periods or timeouts or min down reporters
or anything like that?
Cheers, Dan
On Thu, Aug 13, 2020 at 3:46 PM Manuel Lausch wrote:
>
> Hi,
>
> I investigated an other problem with my nau
10 matches
Mail list logo