Just seeing if anybody has seen this? About 15 more OSDs have failed since then. The cluster can't backfill fast enough, and I fear data loss may be imminent. I did notice one of the latest ones to fail, has 9999 lines similar to this one right before the crash
2019-07-08 15:18:56.170 7fc732475700 5 bluestore(/var/lib/ceph/osd/ceph-59) allocate_bluefs_freespace gifting 0x4d18d00000~400000 to bluefs Any thoughts? On Sat, Jul 6, 2019 at 3:06 PM Brett Chancellor <bchancel...@salesforce.com> wrote: > Has anybody else run into this? It seems to be slowly spreading to other > OSDs, maybe it gets to a bad pg in the backfill process and kills off > another OSD (just guessing since the failures are hours apart). It's kind > of a pain because I have ton continually rebuild these OSDs before the > cluster runs out of space. > > On Wed, Jul 3, 2019 at 2:59 PM Brett Chancellor < > bchancel...@salesforce.com> wrote: > >> Hi All! Today I've had 3 OSDs stop themselves and are unable to restart, >> all with the same error. These OSDs are all on different hosts. All are >> running 14.2.1 >> >> I did try the following two commands >> - ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-80 list > keys >> ## This failed with the same error below >> - ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-80 fsck >> ## After a couple of hours returned... >> 2019-07-03 18:30:02.095 7fe7c1c1ef00 -1 >> bluestore(/var/lib/ceph/osd/ceph-80) fsck warning: legacy statfs record >> found, suggest to run store repair to get consistent statistic reports >> fsck success >> >> >> ## Error when trying to start one of the OSDs >> -12> 2019-07-03 18:36:57.450 7f5e42366700 -1 *** Caught signal >> (Aborted) ** >> in thread 7f5e42366700 thread_name:rocksdb:low0 >> >> ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus >> (stable) >> 1: (()+0xf5d0) [0x7f5e50bd75d0] >> 2: (gsignal()+0x37) [0x7f5e4f9ce207] >> 3: (abort()+0x148) [0x7f5e4f9cf8f8] >> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x199) [0x55a7aaee96ab] >> 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char >> const*, char const*, ...)+0) [0x55a7aaee982a] >> 6: (interval_set<unsigned long, std::map<unsigned long, unsigned long, >> std::less<unsigned long>, std::allocator<std::pair<unsigned long const, >> unsigned long> > > >::insert(unsigned long, unsigned long, unsigned long*, >> unsigned long*)+0x3c6) [0x55a7ab212a66] >> 7: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long, >> std::vector<bluestore_pextent_t, >> mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >> >*)+0x74e) [0x55a7ab48253e] >> 8: (BlueFS::_expand_slow_device(unsigned long, >> std::vector<bluestore_pextent_t, >> mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t> >> >&)+0x111) [0x55a7ab59e921] >> 9: (BlueFS::_allocate(unsigned char, unsigned long, >> bluefs_fnode_t*)+0x68b) [0x55a7ab59f68b] >> 10: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned >> long)+0xe5) [0x55a7ab59fce5] >> 11: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x55a7ab5a1b4b] >> 12: (BlueRocksWritableFile::Flush()+0x3d) [0x55a7ab5bf84d] >> 13: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x55a7abbedd0e] >> 14: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55a7abbedfee] >> 15: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status >> const&, rocksdb::CompactionJob::SubcompactionState*, >> rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice >> const*)+0xbaa) [0x55a7abc3b73a] >> 16: >> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0) >> [0x55a7abc3f150] >> 17: (rocksdb::CompactionJob::Run()+0x298) [0x55a7abc40618] >> 18: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, >> rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xcb7) >> [0x55a7aba7fb67] >> 19: >> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, >> rocksdb::Env::Priority)+0xd0) [0x55a7aba813c0] >> 20: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55a7aba8190a] >> 21: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x264) >> [0x55a7abc8d9c4] >> 22: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f) >> [0x55a7abc8db4f] >> 23: (()+0x129dfff) [0x55a7abd1afff] >> 24: (()+0x7dd5) [0x7f5e50bcfdd5] >> 25: (clone()+0x6d) [0x7f5e4fa95ead] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com