[ceph-users] Re: ceph octopus mysterious OSD crash

Philip Brown Fri, 19 Mar 2021 13:12:22 -0700

if we cant replace a drive on a node in a crash situation, without blowing away 
the entire node....
seems to me ceph octopus fails the "test" part of the "test cluster" :-/

I vaguely recall running into this "doesnt have PARTUUID" problem before.
THAT time, I did end up wiping the entire machine I think.
But for preparing for production use, I really need to have a better documented 
method.

I note that I cant even fall back to "ceph-disk". since that is no longer in 
the distribution, it would seem.
That would be the "easy" way to deal with this... but it is not here.

----- Original Message -----
From: "Stefan Kooman" <[email protected]>
To: "Philip Brown" <[email protected]>
Cc: "ceph-users" <[email protected]>
Sent: Friday, March 19, 2021 12:04:30 PM
Subject: Re: [ceph-users] ceph octopus mysterious OSD crash

On 3/19/21 7:47 PM, Philip Brown wrote:

I see.

> 
> I dont think it works when 7/8 devices are already configured, and the SSD is 
> already mostly sliced.

OK. If it is a test cluster you might just blow it all away. By doing 
this you are simulating a "SSD" failure taking down all HDDs with it. It 
sure isn't pretty. I would say the situation you ended up with is not a 
corner case by any means. I am afraid I would really need to set up a 
test cluster with cephadm to help you further at this point, besides the 
suggestion above.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: ceph octopus mysterious OSD crash

Reply via email to