I was able to apply patches to mimic, but nothing changed. One osd that I had space expanded on fails with bluefs mount IO error, others keep failing with enospc.
> On 1.10.2018, at 19:26, Igor Fedotov <ifedo...@suse.de> wrote: > > So you should call repair which rebalances (i.e. allocates additional space) > BlueFS space. Hence allowing OSD to start. > > Thanks, > > Igor > > > On 10/1/2018 7:22 PM, Igor Fedotov wrote: >> Not exactly. The rebalancing from this kv_sync_thread still might be >> deferred due to the nature of this thread (haven't 100% sure though). >> >> Here is my PR showing the idea (still untested and perhaps unfinished!!!) >> >> https://github.com/ceph/ceph/pull/24353 >> >> >> Igor >> >> >> On 10/1/2018 7:07 PM, Sergey Malinin wrote: >>> Can you please confirm whether I got this right: >>> >>> --- BlueStore.cc.bak 2018-10-01 18:54:45.096836419 +0300 >>> +++ BlueStore.cc 2018-10-01 19:01:35.937623861 +0300 >>> @@ -9049,22 +9049,17 @@ >>> throttle_bytes.put(costs); >>> PExtentVector bluefs_gift_extents; >>> - if (bluefs && >>> - after_flush - bluefs_last_balance > >>> - cct->_conf->bluestore_bluefs_balance_interval) { >>> - bluefs_last_balance = after_flush; >>> - int r = _balance_bluefs_freespace(&bluefs_gift_extents); >>> - assert(r >= 0); >>> - if (r > 0) { >>> - for (auto& p : bluefs_gift_extents) { >>> - bluefs_extents.insert(p.offset, p.length); >>> - } >>> - bufferlist bl; >>> - encode(bluefs_extents, bl); >>> - dout(10) << __func__ << " bluefs_extents now 0x" << std::hex >>> - << bluefs_extents << std::dec << dendl; >>> - synct->set(PREFIX_SUPER, "bluefs_extents", bl); >>> + int r = _balance_bluefs_freespace(&bluefs_gift_extents); >>> + ceph_assert(r >= 0); >>> + if (r > 0) { >>> + for (auto& p : bluefs_gift_extents) { >>> + bluefs_extents.insert(p.offset, p.length); >>> } >>> + bufferlist bl; >>> + encode(bluefs_extents, bl); >>> + dout(10) << __func__ << " bluefs_extents now 0x" << std::hex >>> + << bluefs_extents << std::dec << dendl; >>> + synct->set(PREFIX_SUPER, "bluefs_extents", bl); >>> } >>> // cleanup sync deferred keys >>> >>>> On 1.10.2018, at 18:39, Igor Fedotov <ifedo...@suse.de> wrote: >>>> >>>> So you have just a single main device per OSD.... >>>> >>>> Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition >>>> at main device, standalone devices are supported only. >>>> >>>> Given that you're able to rebuild the code I can suggest to make a patch >>>> that triggers BlueFS rebalance (see code snippet below) on repairing. >>>> PExtentVector bluefs_gift_extents; >>>> int r = _balance_bluefs_freespace(&bluefs_gift_extents); >>>> ceph_assert(r >= 0); >>>> if (r > 0) { >>>> for (auto& p : bluefs_gift_extents) { >>>> bluefs_extents.insert(p.offset, p.length); >>>> } >>>> bufferlist bl; >>>> encode(bluefs_extents, bl); >>>> dout(10) << __func__ << " bluefs_extents now 0x" << std::hex >>>> << bluefs_extents << std::dec << dendl; >>>> synct->set(PREFIX_SUPER, "bluefs_extents", bl); >>>> } >>>> >>>> If it waits I can probably make a corresponding PR tomorrow. >>>> >>>> Thanks, >>>> Igor >>>> On 10/1/2018 6:16 PM, Sergey Malinin wrote: >>>>> I have rebuilt the tool, but none of my OSDs no matter dead or alive have >>>>> any symlinks other than 'block' pointing to LVM. >>>>> I adjusted main device size but it looks like it needs even more space >>>>> for db compaction. After executing bluefs-bdev-expand OSD fails to start, >>>>> however 'fsck' and 'repair' commands finished successfully. >>>>> >>>>> 2018-10-01 18:02:39.755 7fc9226c6240 1 freelist init >>>>> 2018-10-01 18:02:39.763 7fc9226c6240 1 >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening allocation >>>>> metadata >>>>> 2018-10-01 18:02:40.907 7fc9226c6240 1 >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 285 GiB in 2249899 >>>>> extents >>>>> 2018-10-01 18:02:40.951 7fc9226c6240 -1 >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _reconcile_bluefs_freespace bluefs >>>>> extra 0x[6d6f000000~50c800000] >>>>> 2018-10-01 18:02:40.951 7fc9226c6240 1 stupidalloc 0x0x55d053fb9180 >>>>> shutdown >>>>> 2018-10-01 18:02:40.963 7fc9226c6240 1 freelist shutdown >>>>> 2018-10-01 18:02:40.963 7fc9226c6240 4 rocksdb: >>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling >>>>> all background work >>>>> 2018-10-01 18:02:40.967 7fc9226c6240 4 rocksdb: >>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete >>>>> 2018-10-01 18:02:40.971 7fc9226c6240 1 bluefs umount >>>>> 2018-10-01 18:02:40.975 7fc9226c6240 1 stupidalloc 0x0x55d053883800 >>>>> shutdown >>>>> 2018-10-01 18:02:40.975 7fc9226c6240 1 bdev(0x55d053c32e00 >>>>> /var/lib/ceph/osd/ceph-1/block) close >>>>> 2018-10-01 18:02:41.267 7fc9226c6240 1 bdev(0x55d053c32a80 >>>>> /var/lib/ceph/osd/ceph-1/block) close >>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount >>>>> object store >>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 ** ERROR: osd init failed: (5) >>>>> Input/output error >>>>> >>>>> >>>>>> On 1.10.2018, at 18:09, Igor Fedotov <ifedo...@suse.de> wrote: >>>>>> >>>>>> Well, actually you can avoid bluestore-tool rebuild. >>>>>> >>>>>> You'll need to edit the first chunk of blocks.db where labels are >>>>>> stored. (Please make a backup first!!!) >>>>>> >>>>>> Size label is stored at offset 0x52 and is 8 bytes long - little-endian >>>>>> 64bit integer encoding. (Please verify that old value at this offset >>>>>> exactly corresponds to you original volume size and/or 'size' label >>>>>> reported by ceph-bluestore-tool). >>>>>> >>>>>> So you have to put new DB volume size there. Or you can send the first >>>>>> 4K chunk (e.g. extracted with dd) along with new DB volume size (in >>>>>> bytes) to me and I'll do that for you. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Igor >>>>>> >>>>>> >>>>>> On 10/1/2018 5:32 PM, Igor Fedotov wrote: >>>>>>> On 10/1/2018 5:03 PM, Sergey Malinin wrote: >>>>>>>> Before I received your response, I had already added 20GB to the OSD >>>>>>>> (by epanding LV followed by bluefs-bdev-expand) and ran >>>>>>>> "ceph-kvstore-tool bluestore-kv <path> compact", however it still >>>>>>>> needs more space. >>>>>>>> Is that because I didn't update DB size with set-label-key? >>>>>>> In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" >>>>>>> command to commit bluefs volume expansion. >>>>>>> Unfortunately the last command doesn't handle "size" label properly. >>>>>>> That's why you might need to backport and rebuild with the mentioned >>>>>>> commits. >>>>>>> >>>>>>>> What exactly is the label-key that needs to be updated, as I couldn't >>>>>>>> find which one is related to DB: >>>>>>>> >>>>>>>> # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1 >>>>>>>> inferring bluefs devices from bluestore path >>>>>>>> { >>>>>>>> "/var/lib/ceph/osd/ceph-1/block": { >>>>>>>> "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc", >>>>>>>> "size": 471305551872, >>>>>>>> "btime": "2018-07-31 03:06:43.751243", >>>>>>>> "description": "main", >>>>>>>> "bluefs": "1", >>>>>>>> "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533", >>>>>>>> "kv_backend": "rocksdb", >>>>>>>> "magic": "ceph osd volume v026", >>>>>>>> "mkfs_done": "yes", >>>>>>>> "osd_key": "XXX", >>>>>>>> "ready": "ready", >>>>>>>> "whoami": "1" >>>>>>>> } >>>>>>>> } >>>>>>> 'size' label but your output is for block(aka slow) device. >>>>>>> >>>>>>> It should return labels for db/wal devices as well (block.db and >>>>>>> block.wal symlinks respectively). It works for me in master, can't >>>>>>> verify with mimic at the moment though.. >>>>>>> Here is output for master: >>>>>>> >>>>>>> # bin/ceph-bluestore-tool show-label --path dev/osd0 >>>>>>> inferring bluefs devices from bluestore path >>>>>>> { >>>>>>> "dev/osd0/block": { >>>>>>> "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", >>>>>>> "size": 21474836480, >>>>>>> "btime": "2018-09-10 15:55:09.044039", >>>>>>> "description": "main", >>>>>>> "bluefs": "1", >>>>>>> "ceph_fsid": "56eddc15-11b9-4e0b-9192-e391fbae551c", >>>>>>> "kv_backend": "rocksdb", >>>>>>> "magic": "ceph osd volume v026", >>>>>>> "mkfs_done": "yes", >>>>>>> "osd_key": "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==", >>>>>>> "ready": "ready", >>>>>>> "whoami": "0" >>>>>>> }, >>>>>>> "dev/osd0/block.wal": { >>>>>>> "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", >>>>>>> "size": 1048576000, >>>>>>> "btime": "2018-09-10 15:55:09.044985", >>>>>>> "description": "bluefs wal" >>>>>>> }, >>>>>>> "dev/osd0/block.db": { >>>>>>> "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75", >>>>>>> "size": 1048576000, >>>>>>> "btime": "2018-09-10 15:55:09.044469", >>>>>>> "description": "bluefs db" >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> You can try --dev option instead of --path, e.g. >>>>>>> ceph-bluestore-tool show-label --dev <path-to-block.db> >>>>>>> >>>>>>> >>>>>>>>> On 1.10.2018, at 16:48, Igor Fedotov <ifedo...@suse.de> wrote: >>>>>>>>> >>>>>>>>> This looks like a sort of deadlock when BlueFS needs some additional >>>>>>>>> space to replay the log left after the crash. Which happens during >>>>>>>>> BlueFS open. >>>>>>>>> >>>>>>>>> But such a space (at slow device as DB is full) is gifted in >>>>>>>>> background during bluefs rebalance procedure which will occur after >>>>>>>>> the open. >>>>>>>>> >>>>>>>>> Hence OSDs stuck in permanent crashing.. >>>>>>>>> >>>>>>>>> The only way to recover I can suggest for now is to expand DB >>>>>>>>> volumes. You can do that with lvm tools if you have any spare space >>>>>>>>> for that. >>>>>>>>> >>>>>>>>> Once resized you'll need ceph-bluestore-tool to indicate volume >>>>>>>>> expansion to BlueFS (bluefs-bdev-expand command ) and finally update >>>>>>>>> DB volume size label with set-label-key command. >>>>>>>>> >>>>>>>>> The latter is a bit tricky for mimic - you might need to backport >>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b >>>>>>>>> >>>>>>>>> and rebuild ceph-bluestore-tool. Alternatively you can backport >>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b >>>>>>>>> >>>>>>>>> then bluefs expansion and label updates will occur in a single step. >>>>>>>>> >>>>>>>>> I'll do these backports in upstream but this will take some time to >>>>>>>>> pass all the procedures and get into official mimic release. >>>>>>>>> >>>>>>>>> Will fire a ticket to fix the original issue as well. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Igor >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/1/2018 3:28 PM, Sergey Malinin wrote: >>>>>>>>>> These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm >>>>>>>>>> prepare --bluestore /dev/nvme0n1p3" i.e. without specifying wal/db >>>>>>>>>> devices. >>>>>>>>>> OSDs were created with bluestore_min_alloc_size_ssd=4096, another >>>>>>>>>> modified setting is bluestore_cache_kv_max=1073741824 >>>>>>>>>> >>>>>>>>>> DB/block usage collected by prometheus module for 3 failed and 1 >>>>>>>>>> survived OSDs: >>>>>>>>>> >>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0 >>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0 >>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> >>>>>>>>>> this one has survived >>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0 >>>>>>>>>> >>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0 >>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0 >>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0 >>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0 >>>>>>>>>> >>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0 >>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0 >>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0 >>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0 >>>>>>>>>> >>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0 >>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0 >>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0 >>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> First crashed OSD was doing DB compaction, others crashed shortly >>>>>>>>>> after during backfilling. Workload was "ceph-data-scan scan_inodes" >>>>>>>>>> filling metadata pool located on these OSDs at the rate close to 10k >>>>>>>>>> objects/second. >>>>>>>>>> Here is the log excerpt of the first crash occurrence: >>>>>>>>>> >>>>>>>>>> 2018-10-01 03:27:12.762 7fbf16dd6700 0 >>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _balance_bluefs_freespace no >>>>>>>>>> allocate on 0x80000000 min_alloc_size 0x1000 >>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb: >>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] >>>>>>>>>> [JOB 24] Generated table #89741: 106356 keys, 68110589 bytes >>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb: EVENT_LOG_v1 >>>>>>>>>> {"time_micros": 1538353632892744, "cf_name": "default", "job": 24, >>>>>>>>>> "event": "table_file_creation", "file_number": 89741, "file_size": >>>>>>>>>> 68110589, "table_properties": {"data_size": 67112903, "index_size": >>>>>>>>>> 579319, "filter_size": 417316, "raw_key_size": 6733561, >>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": 60994583, >>>>>>>>>> "raw_average_value_size": 573, "num_data_blocks": 16336, >>>>>>>>>> "num_entries": 106356, "filter_policy_name": >>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "14444", >>>>>>>>>> "kMergeOperands": "0"}} >>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb: >>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] >>>>>>>>>> [JOB 24] Generated table #89742: 23214 keys, 16352315 bytes >>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb: EVENT_LOG_v1 >>>>>>>>>> {"time_micros": 1538353632938670, "cf_name": "default", "job": 24, >>>>>>>>>> "event": "table_file_creation", "file_number": 89742, "file_size": >>>>>>>>>> 16352315, "table_properties": {"data_size": 16116986, "index_size": >>>>>>>>>> 139894, "filter_size": 94386, "raw_key_size": 1470883, >>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": 14775006, >>>>>>>>>> "raw_average_value_size": 636, "num_data_blocks": 3928, >>>>>>>>>> "num_entries": 23214, "filter_policy_name": >>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "90", >>>>>>>>>> "kMergeOperands": "0"}} >>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 1 bluefs _allocate failed to >>>>>>>>>> allocate 0x4100000 on bdev 1, free 0x1a00000; fallback to bdev 2 >>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _allocate failed to >>>>>>>>>> allocate 0x4100000 on bdev 2, dne >>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _flush_range >>>>>>>>>> allocated: 0x0 offset: 0x0 length: 0x40ea9f1 >>>>>>>>>> 2018-10-01 03:27:13.046 7fbf1e5e5700 -1 >>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int >>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' >>>>>>>>>> thread 7fbf1e5e5700 time 2018-10-01 03:27:13.048298 >>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 >>>>>>>>>> == "bluefs enospc") >>>>>>>>>> >>>>>>>>>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>>>>>>>>> mimic (stable) >>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>> const*)+0x102) [0x7fbf2d4fe5c2] >>>>>>>>>> 2: (()+0x26c787) [0x7fbf2d4fe787] >>>>>>>>>> 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, >>>>>>>>>> unsigned long)+0x1ab4) [0x5619325114b4] >>>>>>>>>> 4: (BlueRocksWritableFile::Flush()+0x3d) [0x561932527c1d] >>>>>>>>>> 5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x56193271c399] >>>>>>>>>> 6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x56193271d42b] >>>>>>>>>> 7: >>>>>>>>>> (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status >>>>>>>>>> const&, rocksdb::CompactionJob::SubcompactionState*, >>>>>>>>>> rocksdb::RangeDelAggregator*, CompactionIterationStats*, >>>>>>>>>> rocksdb::Slice const*)+0x3db) [0x56193276098b] >>>>>>>>>> 8: >>>>>>>>>> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9) >>>>>>>>>> [0x561932763da9] >>>>>>>>>> 9: (rocksdb::CompactionJob::Run()+0x314) [0x561932765504] >>>>>>>>>> 10: (rocksdb::DBImpl::BackgroundCompaction(bool*, >>>>>>>>>> rocksdb::JobContext*, rocksdb::LogBuffer*, >>>>>>>>>> rocksdb::DBImpl::PrepickedCompaction*)+0xc54) [0x5619325b5c44] >>>>>>>>>> 11: >>>>>>>>>> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, >>>>>>>>>> rocksdb::Env::Priority)+0x397) [0x5619325b8557] >>>>>>>>>> 12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x97) >>>>>>>>>> [0x5619325b8cd7] >>>>>>>>>> 13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned >>>>>>>>>> long)+0x266) [0x5619327a5e36] >>>>>>>>>> 14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x47) >>>>>>>>>> [0x5619327a5fb7] >>>>>>>>>> 15: (()+0xbe733) [0x7fbf2b500733] >>>>>>>>>> 16: (()+0x76db) [0x7fbf2bbf86db] >>>>>>>>>> 17: (clone()+0x3f) [0x7fbf2abbc88f] >>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>> needed to interpret this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 1.10.2018, at 15:01, Igor Fedotov <ifedo...@suse.de> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Sergey, >>>>>>>>>>> >>>>>>>>>>> could you please provide more details on your OSDs ? >>>>>>>>>>> >>>>>>>>>>> What are sizes for DB/block devices? >>>>>>>>>>> >>>>>>>>>>> Do you have any modifications in BlueStore config settings? >>>>>>>>>>> >>>>>>>>>>> Can you share stats you're referring to? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Igor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/1/2018 12:29 PM, Sergey Malinin wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> 3 of 4 NVME OSDs crashed at the same time on assert(0 == "bluefs >>>>>>>>>>>> enospc") and no longer start. >>>>>>>>>>>> Stats collected just before crash show that >>>>>>>>>>>> ceph_bluefs_db_used_bytes is 100% used. Although OSDs have over >>>>>>>>>>>> 50% of free space, it is not reallocated for DB usage. >>>>>>>>>>>> >>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 1 bluefs _allocate failed to >>>>>>>>>>>> allocate 0x100000 on bdev 1, free 0x0; fallback to bdev 2 >>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _allocate failed to >>>>>>>>>>>> allocate 0x100000 on bdev 2, dne >>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _flush_range >>>>>>>>>>>> allocated: 0x0 offset: 0x0 length: 0xa8700 >>>>>>>>>>>> 2018-10-01 12:18:06.748 7f1d6a04d240 -1 >>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int >>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' >>>>>>>>>>>> thread 7f1d6a04d240 time 2018-10-01 12:18:06.746800 >>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED >>>>>>>>>>>> assert(0 == "bluefs enospc") >>>>>>>>>>>> >>>>>>>>>>>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) >>>>>>>>>>>> mimic (stable) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, >>>>>>>>>>>> char const*)+0x102) [0x7f1d6146f5c2] >>>>>>>>>>>> 2: (()+0x26c787) [0x7f1d6146f787] >>>>>>>>>>>> 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, >>>>>>>>>>>> unsigned long)+0x1ab4) [0x5586b22684b4] >>>>>>>>>>>> 4: (BlueRocksWritableFile::Flush()+0x3d) [0x5586b227ec1d] >>>>>>>>>>>> 5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x5586b2473399] >>>>>>>>>>>> 6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) >>>>>>>>>>>> [0x5586b247442b] >>>>>>>>>>>> 7: (rocksdb::BuildTable(std::__cxx11::basic_string<char, >>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > const&, >>>>>>>>>>>> rocksdb::Env*, rocksdb::ImmutableCFOptions const&, >>>>>>>>>>>> rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rock >>>>>>>>>>>> sdb::TableCache*, rocksdb::InternalIterator*, >>>>>>>>>>>> std::unique_ptr<rocksdb::InternalIterator, >>>>>>>>>>>> std::default_delete<rocksdb::InternalIterator> >, >>>>>>>>>>>> rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, >>>>>>>>>>>> std::vector<std::unique_ptr< >>>>>>>>>>>> rocksdb::IntTblPropCollectorFactory, >>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory> >, >>>>>>>>>>>> std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, >>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > co >>>>>>>>>>>> nst*, unsigned int, std::__cxx11::basic_string<char, >>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > const&, >>>>>>>>>>>> std::vector<unsigned long, std::allocator<unsigned long> >, >>>>>>>>>>>> unsigned long, rocksdb::SnapshotChecker*, rocksdb::Compression >>>>>>>>>>>> Type, rocksdb::CompressionOptions const&, bool, >>>>>>>>>>>> rocksdb::InternalStats*, rocksdb::TableFileCreationReason, >>>>>>>>>>>> rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, >>>>>>>>>>>> rocksdb::TableProperties*, int, unsigned long, unsigned long, >>>>>>>>>>>> rocksdb >>>>>>>>>>>> ::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94] >>>>>>>>>>>> 8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, >>>>>>>>>>>> rocksdb::ColumnFamilyData*, rocksdb::MemTable*, >>>>>>>>>>>> rocksdb::VersionEdit*)+0xcb7) [0x5586b2321457] >>>>>>>>>>>> 9: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, >>>>>>>>>>>> std::allocator<unsigned long> > const&, unsigned long*, >>>>>>>>>>>> bool)+0x19de) [0x5586b232373e] >>>>>>>>>>>> 10: >>>>>>>>>>>> (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, >>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, >>>>>>>>>>>> bool, bool)+0x5d4) [0x5586b23242f4] >>>>>>>>>>>> 11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, >>>>>>>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, >>>>>>>>>>>> std::allocator<char> > const&, >>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, >>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescri >>>>>>>>>>>> ptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, >>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, >>>>>>>>>>>> bool)+0x68b) [0x5586b232559b] >>>>>>>>>>>> 12: (rocksdb::DB::Open(rocksdb::DBOptions const&, >>>>>>>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, >>>>>>>>>>>> std::allocator<char> > const&, >>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, >>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor >>>>>>>>>>>>>> const&, std::vector<rocksdb::ColumnFamilyHandle*, >>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, >>>>>>>>>>>>>> rocksdb::DB**)+0x22) [0x5586b2326e72] >>>>>>>>>>>> 13: (RocksDBStore::do_open(std::ostream&, bool, >>>>>>>>>>>> std::vector<KeyValueDB::ColumnFamily, >>>>>>>>>>>> std::allocator<KeyValueDB::ColumnFamily> > const*)+0x170c) >>>>>>>>>>>> [0x5586b220219c] >>>>>>>>>>>> 14: (BlueStore::_open_db(bool, bool)+0xd8e) [0x5586b218ee1e] >>>>>>>>>>>> 15: (BlueStore::_mount(bool, bool)+0x4b7) [0x5586b21bf807] >>>>>>>>>>>> 16: (OSD::init()+0x295) [0x5586b1d673c5] >>>>>>>>>>>> 17: (main()+0x268d) [0x5586b1c554ed] >>>>>>>>>>>> 18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97] >>>>>>>>>>>> 19: (_start()+0x2a) [0x5586b1d1d7fa] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` >>>>>>>>>>>> is needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@lists.ceph.com >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com