Hello Theo, Another method is to set a dummy CRUSH device class, e.g. "import", on the temporary OSD, which is not matched by any CRUSH rules. Obviously, this only works if all your CRUSH rules mention a device class and you set a non-default CRUSH rule for the .mgr pool.
On Tue, Mar 3, 2026 at 3:30 PM Eugen Block via ceph-users <[email protected]> wrote: > > Hi, > > I didn't read all the details of this thread, but if you want to > prevent freshly created OSDs from receiving recovery traffic, you > might want to set: > > ceph config set osd osd_crush_initial_weight 0 > > This allows you to create and start OSDs without a crush weight, so > there won't be any traffic to this OSD at all until you reweight its > crush weight. But the OSD will be up and in, which should allow you to > import PGs anyway. > > > Zitat von Theo Cabrerizo Diem via ceph-users <[email protected]>: > > > Hello Igor, Hello all, > > > > First, I've already accepted the fact that my data most likely is > > unrecoverable by now due to my own fault.. I'm using it to learn and > > hopefully document the information gained as I couldn't find much recent > > information regarding the process of recovery and improvement of my > > understanding about ceph, if someone is willing to further chime in. > > > > The current situation is that I've had multiple OSD failures and some of > > the ceph-osd processes would refuse to start (corruption on rocksdb). I've > > decided to follow > > https://www.croit.io/blog/how-to-recover-inactive-pgs-using-ceph-objectstore-tool-on-ceph-clusters > > as a suggested mechanism to export the PGs and re-import on a fresh OSD. > > > > I had an attempt (sort of documented in this thread) to do it with > > "unexpected" results. I have written down a lot of information, states, > > ceph pg queries, etc (so I can provide their outputs if relevant. I have > > not marked any of the OSDs as lost at any time and the monitors have been > > running without issue since the beginning. > > > > I have some questions regarding my observations following the information > > on that blog (sorry for my lack of experience): > > > > - Is the process for setting up a new "temporary" OSD to import PGs > > correct? (short story: "ceph-volume lvm prepare", start the osd and as soon > > as possible run "ceph osd crush reweight osd.XX 0") > > > > - Creating the OSD as described above, running "ceph-objectstore-tool --op > > list-pgs" on this OSD showed lots of PGs (which I assume they were > > "pre-allocation" from crush) but "ceph osd df" confirmed no data was on the > > osd (only very very little, like less than 2Gb). Is there a way to have an > > OSD "flushed out" so I can import further PGs? > > > > - Running "ceph pg XX.XX query" into some of the imported PGs after > > starting ceph-osd again doesn't seem to reliably reflect my progress. Is > > there a different way? Is it because the PG is still in down state because > > of the dead OSDs? > > For example, pg 11.17 of which had only shard-0 available because only one > > OSD was up. I've imported shard-1 to osd.10 , but "ceph pg 11.17 query" > > shows under "recovery_state": > > > > "intervals": [ > > { > > "first": "2882", > > "last": "2883", > > "acting": "1(1),3(0)" > > }, > > { > > "first": "3021", > > "last": "3023", > > "acting": "3(0),8(1)" > > }, > > { > > "first": "3024", > > "last": "3026", > > "acting": "3(0),8(1),10(2)" > > } > > ] > > > > Running ceph-objectstore-tool --op list-pgs on osd.10 (stopped) confirms > > that 11.17s1 is listed, and running ceph-objectstore-tool --op list-pgs on > > osd.8 (stopped) doesn't show 11.17 at all (none of its shards) > > > > Should I instead keep track of my progress using "ceph-objectstore-tool > > --op list" looking for a "oid" present? > > > > This might reflect on my lack of knowledge regarding how ceph osd "works > > internally", so feel free to correct me or suggest a better approach. I > > still have the 3x original OSDs (out of 4, one, as mentioned on the thread, > > have a bigger corruption and ceph-objectstore-tool fails) and 8x 2Tb disks > > that I can load as new OSDs to import the old data (I had less than 6Tb > > used before the crash). > > > > Should I continue exporting all PGs and keep importing them this way? > > > > Thanks > > > > > > On Sat, 28 Feb 2026 at 16:50, Theo Cabrerizo Diem via ceph-users < > > [email protected]> wrote: > > > >> Hello all, > >> > >> I've managed to get a bunch of 2Tb disks for setting up a few OSDs but > >> before I even started adding them to my monitors, I decided to check my > >> cluster state and noticed another OSD died. Trying to start it, revealed a > >> rocksdb corruption: > >> > >> # /usr/bin/ceph-osd -f --id "1" --osd-data "/var/lib/ceph/osd" --cluster > >> "ceph" --setuser "ceph" --setgroup "ceph" > >> 2026-02-28T15:34:16.596+0000 7f793ce718c0 -1 Falling back to public > >> interface > >> > >> 2026-02-28T15:35:04.304+0000 7f792c1c5640 -1 rocksdb: submit_common error: > >> Corruption: block checksum mismatch: stored = 0, computed = 1265684702, > >> type = 4 in db/180433.sst offset 1048892 size 1429 code = ☻ Rocksdb > >> transaction: > >> PutCF( prefix = O key = > >> 0x7F8000000000000006D0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value > >> size = 33) > >> PutCF( prefix = S key = 'nid_max' value size = 8) > >> PutCF( prefix = S key = 'blobid_max' value size = 8) > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function > >> 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7f792c1c5640 > >> time 2026-02-28T15:35:04.305926+0000 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 14539: > >> FAILED ceph_assert(r == 0) > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x11f) [0x557b7cbf6236] > >> 2: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 3: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 4: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 5: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 6: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> 2026-02-28T15:35:04.309+0000 7f792c1c5640 -1 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function > >> 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7f792c1c5640 > >> time 2026-02-28T15:35:04.305926+0000 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 14539: > >> FAILED ceph_assert(r == 0) > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x11f) [0x557b7cbf6236] > >> 2: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 3: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 4: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 5: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 6: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> > >> *** Caught signal (Aborted) ** > >> in thread 7f792c1c5640 thread_name:bstore_kv_sync > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: /lib64/libc.so.6(+0x3fc30) [0x7f793d2ecc30] > >> 2: /lib64/libc.so.6(+0x8d03c) [0x7f793d33a03c] > >> 3: raise() > >> 4: abort() > >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x17a) [0x557b7cbf6291] > >> 6: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 7: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 8: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 9: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 10: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> 2026-02-28T15:35:04.320+0000 7f792c1c5640 -1 *** Caught signal (Aborted) ** > >> in thread 7f792c1c5640 thread_name:bstore_kv_sync > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: /lib64/libc.so.6(+0x3fc30) [0x7f793d2ecc30] > >> 2: /lib64/libc.so.6(+0x8d03c) [0x7f793d33a03c] > >> 3: raise() > >> 4: abort() > >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x17a) [0x557b7cbf6291] > >> 6: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 7: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 8: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 9: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 10: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> -2421> 2026-02-28T15:34:16.596+0000 7f793ce718c0 -1 Falling back to public > >> interface > >> -8> 2026-02-28T15:35:04.304+0000 7f792c1c5640 -1 rocksdb: submit_common > >> error: Corruption: block checksum mismatch: stored = 0, computed = > >> 1265684702, type = 4 in db/180433.sst offset 1048892 size 1429 code = ☻ > >> Rocksdb transaction: > >> PutCF( prefix = O key = > >> 0x7F8000000000000006D0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value > >> size = 33) > >> PutCF( prefix = S key = 'nid_max' value size = 8) > >> PutCF( prefix = S key = 'blobid_max' value size = 8) > >> -7> 2026-02-28T15:35:04.309+0000 7f792c1c5640 -1 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function > >> 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7f792c1c5640 > >> time 2026-02-28T15:35:04.305926+0000 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 14539: > >> FAILED ceph_assert(r == 0) > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x11f) [0x557b7cbf6236] > >> 2: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 3: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 4: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 5: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 6: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> > >> 0> 2026-02-28T15:35:04.320+0000 7f792c1c5640 -1 *** Caught signal > >> (Aborted) ** > >> in thread 7f792c1c5640 thread_name:bstore_kv_sync > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: /lib64/libc.so.6(+0x3fc30) [0x7f793d2ecc30] > >> 2: /lib64/libc.so.6(+0x8d03c) [0x7f793d33a03c] > >> 3: raise() > >> 4: abort() > >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x17a) [0x557b7cbf6291] > >> 6: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 7: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 8: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 9: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 10: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> -2445> 2026-02-28T15:34:16.596+0000 7f793ce718c0 -1 Falling back to public > >> interface > >> -32> 2026-02-28T15:35:04.304+0000 7f792c1c5640 -1 rocksdb: submit_common > >> error: Corruption: block checksum mismatch: stored = 0, computed = > >> 1265684702, type = 4 in db/180433.sst offset 1048892 size 1429 code = ☻ > >> Rocksdb transaction: > >> PutCF( prefix = O key = > >> 0x7F8000000000000006D0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value > >> size = 33) > >> PutCF( prefix = S key = 'nid_max' value size = 8) > >> PutCF( prefix = S key = 'blobid_max' value size = 8) > >> -31> 2026-02-28T15:35:04.309+0000 7f792c1c5640 -1 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function > >> 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7f792c1c5640 > >> time 2026-02-28T15:35:04.305926+0000 > >> /ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 14539: > >> FAILED ceph_assert(r == 0) > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x11f) [0x557b7cbf6236] > >> 2: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 3: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 4: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 5: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 6: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> > >> -24> 2026-02-28T15:35:04.320+0000 7f792c1c5640 -1 *** Caught signal > >> (Aborted) ** > >> in thread 7f792c1c5640 thread_name:bstore_kv_sync > >> > >> ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle > >> (stable - RelWithDebInfo) > >> 1: /lib64/libc.so.6(+0x3fc30) [0x7f793d2ecc30] > >> 2: /lib64/libc.so.6(+0x8d03c) [0x7f793d33a03c] > >> 3: raise() > >> 4: abort() > >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x17a) [0x557b7cbf6291] > >> 6: /usr/bin/ceph-osd(+0x44a3ef) [0x557b7cbd03ef] > >> 7: (BlueStore::_kv_sync_thread()+0xaf1) [0x557b7d276191] > >> 8: /usr/bin/ceph-osd(+0xa790d1) [0x557b7d1ff0d1] > >> 9: /lib64/libc.so.6(+0x8b2fa) [0x7f793d3382fa] > >> 10: /lib64/libc.so.6(+0x110400) [0x7f793d3bd400] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> Aborted (core dumped) > >> > >> There was no sign of hardware failure on kernel logs. I think, for now, I > >> should move away from alpine binaries and use ceph's official containers. > >> Is there any tool that can try to fix this rocksdb issue? Or is the > >> recommended way to export all PGs from this osd and re-import on a new one? > >> > >> I agree that at this point I should consider rechecking all the hardware > >> involved. I plan to decommission this system once I (if possible) I get the > >> data out. > >> > >> Thanks, > >> Theo > >> > >> On Tue, 24 Feb 2026 at 22:22, Theo Cabrerizo Diem <[email protected]> > >> wrote: > >> > >> > Hello Igor, > >> > > >> > > Just in case - didn't you overwrite existing PG replicas at target > >> OSDs > >> > > when exporting PGs back to OSDs 1&3? > >> > > >> > Now that you mention that an OSD cannot have two shards, I think I did > >> put > >> > myself now in a tricky place. > >> > I had 4 OSDs, two "died" .. leaving two online, so when I did export > >> > shards from osd.2 I did end up importing them in osd.3 (which had other > >> > shards for the same PG). > >> > > >> > Doesn't look "that bad" ... (*looking up to the sky*) ... > >> > ## on osd.3 > >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ --op list-pgs | > >> > grep 16.b > >> > 16.bs1 > >> > 16.bs0 > >> > (and the s2 should be in osd.1, s1 was imported) > >> > > >> > ## ceph health reports > >> > pg 16.b is down, acting [3,NONE,1] > >> > > >> > I might have some spares HDDs to set up a new OSD (one though) with > >> enough > >> > space for it to balance out, but as I've mentioned, there are quite a > >> bunch > >> > of PGs in down state. Should I set-up a new OSD and wait until it > >> "settles > >> > up"? The exported shards from the dead osd.2 that ceph health was > >> > complaining about sums to about 400Gb. > >> > > >> > Is it possible to set-up an OSD and have ceph not push too many shards > >> > into it? (the replica pools are "small"). I ask this because it should be > >> > easy to get a 1Tb/2Tb disk, but if it should balance completely for the > >> PGs > >> > to "unstuck", I need a bigger disk, which I don't have at the moment (and > >> > currently not the best market prices to get them). > >> > > >> > Another option is if osd.0 is gone for good (the one that doesn't even > >> > allows me to run 'ceph-objectstore-tool --list-pgs'), I can wipe its > >> > disk/clone and use it instead, it is the same capacity as other OSDs (I > >> > used a spare I had at home). If you believe it is still workable, I can > >> try > >> > to find some more disks and not touch those for the moment. > >> > > >> > Thanks for the patience so far. > >> > > >> > On Tue, 24 Feb 2026 at 16:39, Igor Fedotov <[email protected]> > >> wrote: > >> > > >> >> Hi Theo, > >> >> > >> >> Sorry I can't tell for sure what would marking OSD lost do with its > >> >> encryption keys. Likely - yes, they'll be lost. > >> >> > >> >> But instead of going this way I'd rather suggest you to add another two > >> >> OSDs and let Ceph recover more PGs replicas into them. > >> >> > >> >> Just in case - didn't you overwrite existing PG replicas at target OSDs > >> >> when exporting PGs back to OSDs 1&3? The same PG can't have two > >> >> replicas/shards at a single OSD while your OSD count is pretty limited.. > >> >> Just curious for now - that still shouldn't be an issue given you have > >> >> at least 2 replicas/shards for all the pools anyway. > >> >> > >> >> Thanks, > >> >> > >> >> Igor > >> >> > >> >> On 2/21/2026 11:00 PM, Theo Cabrerizo Diem via ceph-users wrote: > >> >> > Hello Igor, > >> >> > > >> >> > First of all, sorry about the late reply. It took me a while to export > >> >> all > >> >> > shards that weren't available from the osd.2 (1 and 3 were fine, 2 > >> >> didn't > >> >> > start but i could use `ceph-objectstore-tool ... --op list-pgs` while > >> >> osd.0 > >> >> > I couldn't even list the pgs, it threw an error right away - more > >> about > >> >> it > >> >> > later in the email) > >> >> > > >> >> > Two of the unavailable shards, when exporting, ceph-objectstore-tool > >> >> core > >> >> > dumped with the same issue in the rocksdb, but I should have enough > >> >> chunks > >> >> > to not need them - just mentioning in case is useful: > >> >> > > >> >> > sh-5.1# ceph-objectstore-tool --data-path /var/lib/ceph/osd --pgid > >> >> 11.19s2 > >> >> > --op export --file pg.11.19s2.dat > >> >> > /ceph/rpmbuild/BUILD/ceph-20.2.0/src/kv/RocksDBStore.cc: In function > >> >> > 'virtual int RocksDBStore::get(const std::string&, const std::string&, > >> >> > ceph::bufferlist*)' thread 7ff3be4ca800 time > >> >> 2026-02-04T09:42:00.743877+0000 > >> >> > /ceph/rpmbuild/BUILD/ceph-20.2.0/src/kv/RocksDBStore.cc: 1961: > >> >> > ceph_abort_msg("block checksum mismatch: stored = 246217859, computed > >> = > >> >> > 2155741315, type = 4 in db/170027.sst offset 28264757 size 1417") > >> >> > ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) > >> >> tentacle > >> >> > (stable - RelWithDebInfo) > >> >> > 1: (ceph::__ceph_abort(char const*, int, char const*, > >> >> > std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > const&)+0xc9) [0x7ff3bf5391fd] > >> >> > 2: (RocksDBStore::get(std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > const&, > >> >> > std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3bc) > >> >> > [0x555667b340bc] > >> >> > 3: > >> >> > > >> >> > >> (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > >> >> > ghobject_t const&, > >> >> > std::set<std::__cxx11::basic_string<char,std::char_traits<char>, > >> >> > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > >, > >> >> > std::allocator<std::__cxx11::basic_string<char, > >> std::char_traits<char>, > >> >> > std::allocator<char> > > > const&, > >> >> > std::map<std::__cxx11::basic_string<char,std::char_traits<char>, > >> >> > std::allocator<char> >, ceph::buffer::v15_2_0::list, > >> >> > std::less<std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > >, > >> >> > std::allocator<std::pair<std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > const, > >> >> > ceph::buffer::v15_2_0::list> > >*)+0x401) [0x555667a25fe1] > >> >> > 4: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0x361) > >> >> > [0x5556675e0101] > >> >> > 5: main() > >> >> > 6: /lib64/libc.so.6(+0x2a610) [0x7ff3be930610] > >> >> > 7: __libc_start_main() > >> >> > 8: _start() > >> >> > *** Caught signal (Aborted) ** > >> >> > in thread 7ff3be4ca800 thread_name:ceph-objectstor > >> >> > ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) > >> >> tentacle > >> >> > (stable - RelWithDebInfo) > >> >> > 1: /lib64/libc.so.6(+0x3fc30) [0x7ff3be945c30] > >> >> > 2: /lib64/libc.so.6(+0x8d03c) [0x7ff3be99303c] > >> >> > 3: raise() > >> >> > 4: abort() > >> >> > 5: (ceph::__ceph_abort(char const*, int, char const*, > >> >> > std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > const&)+0x186) [0x7ff3bf5392ba] > >> >> > 6: (RocksDBStore::get(std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > const&, > >> >> > std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3bc) > >> >> > [0x555667b340bc] > >> >> > 7: > >> >> > > >> >> > >> (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > >> >> > ghobject_t const&, > >> >> > std::set<std::__cxx11::basic_string<char,std::char_traits<char>, > >> >> > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > >, > >> >> > std::allocator<std::__cxx11::basic_string<char, > >> std::char_traits<char>, > >> >> > std::allocator<char> > > > const&, > >> >> > std::map<std::__cxx11::basic_string<char,std::char_traits<char>, > >> >> > std::allocator<char> >, ceph::buffer::v15_2_0::list, > >> >> > std::less<std::__cxx11::basic_string<char, std::char_traits<char>, > >> >> > std::allocator<char> > >, > >> >> > std::allocator<std::pair<std::__cxx11::basic_string<char, > >> >> > std::char_traits<char>, std::allocator<char> > const, > >> >> > ceph::buffer::v15_2_0::list> > >*)+0x401) [0x555667a25fe1] > >> >> > 8: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0x361) > >> >> > [0x5556675e0101] > >> >> > 9: main() > >> >> > 10: /lib64/libc.so.6(+0x2a610) [0x7ff3be930610] > >> >> > 11: __libc_start_main() > >> >> > 12: _start() > >> >> > Aborted (core dumped) > >> >> > > >> >> > > >> >> > > >> >> > After importing all shards that I could recover that weren't > >> available, > >> >> I > >> >> > don't have any "unknown" pgs anymore. I still have lots of PGs in > >> "down" > >> >> > state, which I assume I need to flag both "dead" OSDs as lost to > >> unstuck > >> >> > them. Since it is an operation I cannot go back, I would like to > >> confirm > >> >> > that is indeed the correct next step to take. > >> >> > > >> >> > I have a few questions to understand "what happens" in the next step > >> >> > (marking osd as lost?): > >> >> > > >> >> > Shall I assume that once I flag an OSD as lost, I won't be able to > >> >> > "activate" it since I use encryption when initializing the bluestore > >> >> OSD, > >> >> > or flagging them as lost won't destroy their unlocking keys? (which > >> >> means > >> >> > any hope of further extracting data to be gone, mostly on the osd.0 > >> >> which I > >> >> > couldn't use ceph-objectstore-tool at all since the power loss). > >> >> > > >> >> > I think I should have all the shards from the PGs but just in case, > >> I've > >> >> > managed to make a clone of the osd.0 on a different physical disk (the > >> >> > other reason I took long to answer). But still ceph-objectstore-tool > >> >> > refuses to run: > >> >> > > >> >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd --op list-pgs > >> >> > Mount failed with '(5) Input/output error' > >> >> > > >> >> > # ls -l /var/lib/ceph/osd > >> >> > total 28 > >> >> > lrwxrwxrwx 1 ceph ceph 50 Feb 4 08:26 block -> > >> >> > /dev/mapper/zNPZJR-i0TZ-6NtK-URto-tjfs-iJRb-GCAYEm > >> >> > -rw------- 1 ceph ceph 37 Feb 4 08:26 ceph_fsid > >> >> > -rw------- 1 ceph ceph 37 Feb 4 08:26 fsid > >> >> > -rw------- 1 ceph ceph 55 Feb 4 08:26 keyring > >> >> > -rw------- 1 ceph ceph 106 Jan 24 00:44 lockbox.keyring > >> >> > -rw------- 1 ceph ceph 6 Feb 4 08:26 ready > >> >> > -rw------- 1 ceph ceph 10 Feb 4 08:26 type > >> >> > -rw------- 1 ceph ceph 2 Feb 4 08:26 whoami > >> >> > > >> >> > Just as information, all except 2 pools in my cluster are > >> "replicated". > >> >> > Pools id 11 and 16 are erasure coded (2+1). If I understood correctly, > >> >> as > >> >> > long as I have two acting shards (and at most one "NONE"), data should > >> >> be > >> >> > available (at least in read-only) once I mark the down OSDs as lost. > >> Is > >> >> > that understanding correct? > >> >> > > >> >> > Another information, pools 10 and 15 are the "replicated root pools" > >> >> before > >> >> > the erasure coded pools were created. > >> >> > > >> >> > Ignoring osd.0 for now, here are the current state of my cluster (mds > >> is > >> >> > intentionally not started while I try to fix the PGs): > >> >> > ### ceph osd lspools > >> >> > 3 .rgw.root > >> >> > 4 default.rgw.log > >> >> > 5 default.rgw.control > >> >> > 6 default.rgw.meta > >> >> > 10 ark.data > >> >> > 11 ark.data_ec > >> >> > 12 ark.metadata > >> >> > 14 .mgr > >> >> > 15 limbo > >> >> > 16 limbo.data_ec > >> >> > 18 default.rgw.buckets.index > >> >> > 19 default.rgw.buckets.data > >> >> > ### > >> >> > > >> >> > ### ceph health > >> >> > # ceph -s > >> >> > cluster: > >> >> > id: 021f058f-dbf3-4a23-adb5-21d83f3f1bb6 > >> >> > health: HEALTH_ERR > >> >> > 1 filesystem is degraded > >> >> > 1 filesystem has a failed mds daemon > >> >> > 1 filesystem is offline > >> >> > insufficient standby MDS daemons available > >> >> > Reduced data availability: 143 pgs inactive, 143 pgs down > >> >> > Degraded data redundancy: 1303896/7149898 objects > >> degraded > >> >> > (18.237%), 218 pgs degraded, 316 pgs undersized > >> >> > 144 pgs not deep-scrubbed in time > >> >> > 459 pgs not scrubbed in time > >> >> > 256 slow ops, oldest one blocked for 1507794 sec, osd.1 > >> has > >> >> > slow ops > >> >> > too many PGs per OSD (657 > max 500) > >> >> > > >> >> > services: > >> >> > mon: 2 daemons, quorum ceph-ymir-mon2,ceph-ymir-mon1 (age 2w) > >> >> > mgr: ceph-ymir-mgr1(active, since 2w) > >> >> > mds: 0/1 daemons up (1 failed) > >> >> > osd: 4 osds: 2 up (since 29m), 2 in (since 4w); 24 remapped pgs > >> >> > > >> >> > data: > >> >> > volumes: 0/1 healthy, 1 failed > >> >> > pools: 12 pools, 529 pgs > >> >> > objects: 2.46M objects, 7.4 TiB > >> >> > usage: 8.3 TiB used, 13 TiB / 22 TiB avail > >> >> > pgs: 27.032% pgs not active > >> >> > 1303896/7149898 objects degraded (18.237%) > >> >> > 306628/7149898 objects misplaced (4.289%) > >> >> > 218 active+undersized+degraded > >> >> > 143 down > >> >> > 98 active+undersized > >> >> > 45 active+clean > >> >> > 19 active+clean+remapped > >> >> > 4 active+clean+remapped+scrubbing+deep > >> >> > 1 active+clean+remapped+scrubbing > >> >> > 1 active+clean+scrubbing+deep > >> >> > ### ceph health > >> >> > > >> >> > ### ceph health detail > >> >> > # ceph health detail > >> >> > HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds > >> >> daemon; > >> >> > 1 filesystem is offline; insufficient standby > >> >> > MDS daemons available; Reduced data availability: 143 pgs inactive, > >> >> 143 > >> >> > pgs down; Degraded data redundancy: 1303896/714 > >> >> > 9898 objects degraded (18.237%), 218 pgs degraded, 316 pgs undersized; > >> >> 144 > >> >> > pgs not deep-scrubbed in time; 459 pgs not sc > >> >> > rubbed in time; 256 slow ops, oldest one blocked for 1508207 sec, > >> osd.1 > >> >> has > >> >> > slow ops; too many PGs per OSD (657 > max 50 > >> >> > 0) > >> >> > [WRN] FS_DEGRADED: 1 filesystem is degraded > >> >> > fs ark is degraded > >> >> > [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon > >> >> > fs ark has 1 failed mds > >> >> > [ERR] MDS_ALL_DOWN: 1 filesystem is offline > >> >> > fs ark is offline because no MDS is active for it. > >> >> > [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons > >> >> available > >> >> > have 0; want 1 more > >> >> > [WRN] PG_AVAILABILITY: Reduced data availability: 143 pgs inactive, > >> 143 > >> >> pgs > >> >> > down > >> >> > pg 10.11 is down, acting [1,3] > >> >> > pg 10.18 is down, acting [3,1] > >> >> > pg 10.1d is down, acting [1,3] > >> >> > pg 10.1f is down, acting [1,3] > >> >> > pg 11.10 is down, acting [3,1,NONE] > >> >> > pg 11.12 is down, acting [1,NONE,3] > >> >> > pg 11.18 is stuck inactive for 4w, current state down, last > >> acting > >> >> > [1,3,NONE] > >> >> > pg 11.19 is down, acting [3,1,NONE] > >> >> > pg 11.1b is down, acting [1,NONE,3] > >> >> > pg 11.62 is down, acting [NONE,3,1] > >> >> > pg 11.63 is down, acting [3,NONE,1] > >> >> > pg 11.64 is down, acting [NONE,1,3] > >> >> > pg 11.66 is down, acting [NONE,3,1] > >> >> > pg 11.67 is down, acting [1,NONE,3] > >> >> > pg 11.68 is down, acting [3,NONE,1] > >> >> > pg 11.69 is down, acting [NONE,1,3] > >> >> > pg 11.6a is down, acting [1,NONE,3] > >> >> > pg 11.6b is down, acting [NONE,1,3] > >> >> > pg 11.6f is down, acting [NONE,3,1] > >> >> > pg 11.71 is down, acting [1,3,NONE] > >> >> > pg 11.72 is down, acting [1,3,NONE] > >> >> > pg 11.74 is down, acting [NONE,3,1] > >> >> > pg 11.76 is down, acting [1,NONE,3] > >> >> > pg 11.78 is down, acting [3,1,NONE] > >> >> > pg 11.7d is down, acting [NONE,3,1] > >> >> > pg 11.7e is down, acting [NONE,1,3] > >> >> > pg 15.15 is down, acting [1,3] > >> >> > pg 15.16 is down, acting [3,1] > >> >> > pg 15.17 is down, acting [1,3] > >> >> > pg 15.1a is down, acting [3,1] > >> >> > pg 16.1 is down, acting [1,3,NONE] > >> >> > pg 16.4 is down, acting [1,3,NONE] > >> >> > pg 16.b is down, acting [3,NONE,1] > >> >> > pg 16.60 is down, acting [3,1,NONE] > >> >> > pg 16.61 is down, acting [3,1,NONE] > >> >> > pg 16.62 is down, acting [3,NONE,1] > >> >> > pg 16.63 is down, acting [3,NONE,1] > >> >> > pg 16.65 is down, acting [NONE,3,1] > >> >> > pg 16.67 is down, acting [1,NONE,3] > >> >> > pg 16.68 is down, acting [1,NONE,3] > >> >> > pg 16.69 is down, acting [3,1,NONE] > >> >> > pg 16.6a is down, acting [1,3,NONE] > >> >> > pg 16.6c is down, acting [1,3,NONE] > >> >> > pg 16.70 is down, acting [3,NONE,1] > >> >> > pg 16.73 is down, acting [3,NONE,1] > >> >> > pg 16.74 is down, acting [1,3,NONE] > >> >> > pg 16.75 is down, acting [3,1,NONE] > >> >> > pg 16.79 is down, acting [3,NONE,1] > >> >> > pg 16.7a is down, acting [1,3,NONE] > >> >> > pg 16.7e is down, acting [1,3,NONE] > >> >> > pg 16.7f is down, acting [3,NONE,1] > >> >> > [WRN] PG_DEGRADED: Degraded data redundancy: 1303896/7149898 objects > >> >> > degraded (18.237%), 218 pgs degraded, 316 pgs under > >> >> > sized > >> >> > pg 3.18 is stuck undersized for 36m, current state > >> >> active+undersized, > >> >> > last acting [1,3] > >> >> > ...<snipped for brevity> > >> >> > ### > >> >> > > >> >> > Once again, I cannot thank you enough for looking into my issue. > >> >> > I have the impression that being able to recover the data I need is > >> just > >> >> > around the corner. Although the croit.io blog did mention flagging > >> the > >> >> osd > >> >> > as lost, I would like to double check it to avoid losing any > >> >> possibility to > >> >> > recover the data. > >> >> > > >> >> > If there's anything further I could check or if you need full output > >> of > >> >> the > >> >> > commands, let me know. > >> >> > > >> >> > Thanks in advance. > >> >> > > >> >> > On Tue, 3 Feb 2026 at 10:26, Igor Fedotov <[email protected]> > >> >> wrote: > >> >> > > >> >> >> Hi Theo, > >> >> >> > >> >> >> you might want to try to use PG export/import using > >> >> ceph-objectstore-tool. > >> >> >> > >> >> >> Please find more details here > >> >> >> > >> >> > >> https://www.croit.io/blog/how-to-recover-inactive-pgs-using-ceph-objectstore-tool-on-ceph-clusters > >> >> >> > >> >> >> > >> >> >> Thanks, > >> >> >> > >> >> >> Igor > >> >> >> On 03/02/2026 02:38, Theo Cabrerizo Diem via ceph-users wrote: > >> >> >> > >> >> >> :12:18.895+0000 7f0c543eac00 -1 bluestore(/var/lib/ceph/osd) > >> >> >> fsck error: free extent 0x1714c521000~978b26df000 intersects > >> >> allocatedblocks > >> >> >> fsck status: remaining 1 error(s) and warning(s) > >> >> >> > >> >> >> > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- [email protected] > >> >> > To unsubscribe send an email to [email protected] > >> >> > >> > > >> _______________________________________________ > >> ceph-users mailing list -- [email protected] > >> To unsubscribe send an email to [email protected] > >> > > _______________________________________________ > > ceph-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
