if we cant replace a drive on a node in a crash situation, without blowing away the entire node.... seems to me ceph octopus fails the "test" part of the "test cluster" :-/
I vaguely recall running into this "doesnt have PARTUUID" problem before. THAT time, I did end up wiping the entire machine I think. But for preparing for production use, I really need to have a better documented method. I note that I cant even fall back to "ceph-disk". since that is no longer in the distribution, it would seem. That would be the "easy" way to deal with this... but it is not here. ----- Original Message ----- From: "Stefan Kooman" <[email protected]> To: "Philip Brown" <[email protected]> Cc: "ceph-users" <[email protected]> Sent: Friday, March 19, 2021 12:04:30 PM Subject: Re: [ceph-users] ceph octopus mysterious OSD crash On 3/19/21 7:47 PM, Philip Brown wrote: I see. > > I dont think it works when 7/8 devices are already configured, and the SSD is > already mostly sliced. OK. If it is a test cluster you might just blow it all away. By doing this you are simulating a "SSD" failure taking down all HDDs with it. It sure isn't pretty. I would say the situation you ended up with is not a corner case by any means. I am afraid I would really need to set up a test cluster with cephadm to help you further at this point, besides the suggestion above. Gr. Stefan _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
