Just seeing if anybody has seen this? About 15 more OSDs have failed since
then. The cluster can't backfill fast enough, and I fear data loss may be
imminent.   I did notice one of the latest ones to fail, has 9999 lines
similar to this one right before the crash

2019-07-08 15:18:56.170 7fc732475700  5
bluestore(/var/lib/ceph/osd/ceph-59) allocate_bluefs_freespace gifting
0x4d18d00000~400000 to bluefs

Any thoughts?

On Sat, Jul 6, 2019 at 3:06 PM Brett Chancellor <bchancel...@salesforce.com>
wrote:

> Has anybody else run into this? It seems to be slowly spreading to other
> OSDs, maybe it gets to a bad pg in the backfill process and kills off
> another OSD (just guessing since the failures are hours apart).  It's kind
> of a pain because I have ton continually rebuild these OSDs before the
> cluster runs out of space.
>
> On Wed, Jul 3, 2019 at 2:59 PM Brett Chancellor <
> bchancel...@salesforce.com> wrote:
>
>> Hi All! Today I've had 3 OSDs stop themselves and are unable to restart,
>> all with the same error. These OSDs are all on different hosts. All are
>> running 14.2.1
>>
>> I did try the following two commands
>> - ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-80 list > keys
>>   ## This failed with the same error below
>> - ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-80 fsck
>>  ## After a couple of hours returned...
>> 2019-07-03 18:30:02.095 7fe7c1c1ef00 -1
>> bluestore(/var/lib/ceph/osd/ceph-80) fsck warning: legacy statfs record
>> found, suggest to run store repair to get consistent statistic reports
>> fsck success
>>
>>
>> ## Error when trying to start one of the OSDs
>>    -12> 2019-07-03 18:36:57.450 7f5e42366700 -1 *** Caught signal
>> (Aborted) **
>>  in thread 7f5e42366700 thread_name:rocksdb:low0
>>
>>  ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
>> (stable)
>>  1: (()+0xf5d0) [0x7f5e50bd75d0]
>>  2: (gsignal()+0x37) [0x7f5e4f9ce207]
>>  3: (abort()+0x148) [0x7f5e4f9cf8f8]
>>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x199) [0x55a7aaee96ab]
>>  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
>> const*, char const*, ...)+0) [0x55a7aaee982a]
>>  6: (interval_set<unsigned long, std::map<unsigned long, unsigned long,
>> std::less<unsigned long>, std::allocator<std::pair<unsigned long const,
>> unsigned long> > > >::insert(unsigned long, unsigned long, unsigned long*,
>> unsigned long*)+0x3c6) [0x55a7ab212a66]
>>  7: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long,
>> std::vector<bluestore_pextent_t,
>> mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
>> >*)+0x74e) [0x55a7ab48253e]
>>  8: (BlueFS::_expand_slow_device(unsigned long,
>> std::vector<bluestore_pextent_t,
>> mempool::pool_allocator<(mempool::pool_index_t)4, bluestore_pextent_t>
>> >&)+0x111) [0x55a7ab59e921]
>>  9: (BlueFS::_allocate(unsigned char, unsigned long,
>> bluefs_fnode_t*)+0x68b) [0x55a7ab59f68b]
>>  10: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
>> long)+0xe5) [0x55a7ab59fce5]
>>  11: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x55a7ab5a1b4b]
>>  12: (BlueRocksWritableFile::Flush()+0x3d) [0x55a7ab5bf84d]
>>  13: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x55a7abbedd0e]
>>  14: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55a7abbedfee]
>>  15: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
>> const&, rocksdb::CompactionJob::SubcompactionState*,
>> rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice
>> const*)+0xbaa) [0x55a7abc3b73a]
>>  16:
>> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
>> [0x55a7abc3f150]
>>  17: (rocksdb::CompactionJob::Run()+0x298) [0x55a7abc40618]
>>  18: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*,
>> rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
>> [0x55a7aba7fb67]
>>  19:
>> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
>> rocksdb::Env::Priority)+0xd0) [0x55a7aba813c0]
>>  20: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55a7aba8190a]
>>  21: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x264)
>> [0x55a7abc8d9c4]
>>  22: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
>> [0x55a7abc8db4f]
>>  23: (()+0x129dfff) [0x55a7abd1afff]
>>  24: (()+0x7dd5) [0x7f5e50bcfdd5]
>>  25: (clone()+0x6d) [0x7f5e4fa95ead]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to