[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-21 Thread Sebastian Mazza
Hey Igor! > thanks a lot for the new logs - looks like they provides some insight. I'm glad the logs are helpful. > At this point I think the root cause is apparently a race between deferred > writes replay and some DB maintenance task happening on OSD startup. It seems > that deferred writ

[ceph-users] Re: Reducing ceph cluster size in half

2022-02-21 Thread Etienne Menguy
Hi, There are different ways, but I would : - Change OSD weight (and not reweight) I want to remove to 0 - Wait for cluster health - Stop OSD I want to remove - If data are ok, remove osd from crushmap. - There is a no reason stopping osd impacts your service as they have no data, it’s

[ceph-users] Re: Reducing ceph cluster size in half

2022-02-21 Thread Matt Vandermeulen
This might be easiest to work about in two steps: Draining hosts, and doing a PG merge. You can do it in either order (though thinking about it, doing the merge first will give you more cluster-wide resources to do it faster). Draining the hosts can be done in a few ways, too. If you want t

[ceph-users] Reducing ceph cluster size in half

2022-02-21 Thread Jason Borden
Hi all, I'm looking for some advice on reducing my ceph cluster in half. I currently have 40 hosts and 160 osds on a cephadm managed pacific cluster. The storage space is only 12% utilized. I want to reduce the cluster to 20 hosts and 80 osds while keeping the cluster operational. I'd prefer

[ceph-users] ceph os filesystem in read only - mgr bug

2022-02-21 Thread Marc
Interesting from this situation is: Feb 21 21:44:46 ceph-mgr: 2022-02-21 20:44:46.913 7f70896f9700 -1 os_release_parse - failed to open /etc/os-release: (5) Input/output error Feb 21 21:44:46 ceph-mgr: 2022-02-21 20:44:46.913 7f70896f9700 -1 distro_detect - /etc/os-release is required Feb 21

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-21 Thread Igor Fedotov
Hey Sebastian, thanks a lot for the new logs - looks like they provides some insight. At this point I think the root cause is apparently a race between deferred writes replay and some DB maintenance task happening on OSD startup. It seems that deferred write replay updates a block extent whic

[ceph-users] ceph os filesystem in read only

2022-02-21 Thread Marc
I have a ceph node that has an os filesystem going into read only for what ever reason[1]. 1. How long will ceph continue to run before it starts complaining about this? Looks like it is fine for a few hours, ceph osd tree and ceph -s, seem not to notice anything. 2. This is still nautilus w

[ceph-users] Re: Ceph EC K+M

2022-02-21 Thread Eugen Block
The customer's requirement was to sustain the loss of one of two datacenters and two additional hosts. The crush failure domain is "host". There are 10 hosts in each DC, so we put 9 chunks in each DC to be able to recover completely if one host fails. This worked quite nicely already, they

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-21 Thread Sebastian Mazza
Hi Igor, today (21-02-2022) at 13:49:28.452+0100, I crashed the OSD 7 again. And this time I have logs with “debug bluefs = 20” and "debug bdev = 20” for every OSD in the cluster! It was the OSD with the ID 7 again. So the HDD has failed now the third time! Coincidence? Probably not… The import

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-21 Thread Sebastian Mazza
Hi Igor, please find the the startup log under the following link: https://we.tl/t-E6CadpW1ZL It also includes the “normal" log of that OSD from the the day before the crash and the RocksDB sst file with the “Bad table magic number” (db/001922.sst) Best regards, Sebastian > On 21.02.2022, at

[ceph-users] Re: When is the ceph.conf file evaluated?

2022-02-21 Thread Janne Johansson
Den mån 21 feb. 2022 kl 14:17 skrev Ackermann, Christoph : > > Ok.. Think it would be ok not to use a mashed up version, since I use three > old and four new monitors actually. I think it's ok to have all old and new there, even if one or two are not available currently out of all mons, the clie

[ceph-users] Re: When is the ceph.conf file evaluated?

2022-02-21 Thread Ackermann, Christoph
Ok.. Think it would be ok not to use a mashed up version, since I use three old and four new monitors actually. Thanks a lot Christoph PS I just use "minimal" FSID and monitor list in our ceph.conf file. Am Mo., 21. Feb. 2022 um 14:03 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > De

[ceph-users] Re: Problem with Ceph daemons

2022-02-21 Thread Adam King
I'd say you probably don't need both services. It looks like they're configured to listen to the same port(80 from the output) and are being placed on the same hosts (c01-c06). It could be that port conflict that is causing the rgw daemons to go into error state. Cephadm will try to put 2 down on e

[ceph-users] Re: When is the ceph.conf file evaluated?

2022-02-21 Thread Janne Johansson
Den mån 21 feb. 2022 kl 13:55 skrev Ackermann, Christoph : > > Dear all, > > I'm on the way to change five CentOs7 monitors to Rocky8. So when is the > ceph.conf file evaluated? Only on startup of ceph-xyz daemon or > dynamically? Is it worth generating an intermediate file containing some > old an

[ceph-users] When is the ceph.conf file evaluated?

2022-02-21 Thread Ackermann, Christoph
Dear all, I'm on the way to change five CentOs7 monitors to Rocky8. So when is the ceph.conf file evaluated? Only on startup of ceph-xyz daemon or dynamically? Is it worth generating an intermediate file containing some old and some new ceph monitor hosts/IPs for client computers? Thanks for any

[ceph-users] Re: Ceph EC K+M

2022-02-21 Thread Eugen Block
Hi, it really depends on the resiliency requirements and the use case. We have a couple of customers with EC profiles like k=7 m=11. The potential waste of space as Anthony already mentions has to be considered, of course. But with regards to performance we haven't heard any complaints ye

[ceph-users] Re: ceph-mgr : ModuleNotFoundError: No module named 'requests'

2022-02-21 Thread Ernesto Puerta
Hi Florent, Can you please check if the location where the Python Requests package is installed is the same for Buster and Bullseye? - https://debian.pkgs.org/10/debian-main-amd64/python3-requests_2.21.0-1_all.deb.html - https://debian.pkgs.org/11/debian-main-amd64/python3-requests_

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-21 Thread Igor Fedotov
Hi Sebastian, could you please share failing OSD startup log? Thanks, Igor On 2/20/2022 5:10 PM, Sebastian Mazza wrote: Hi Igor, it happened again. One of the OSDs that crashed last time, has a corrupted RocksDB again. Unfortunately I do not have debug logs from the OSDs again. I was coll