I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.


> On 1.10.2018, at 19:26, Igor Fedotov <ifedo...@suse.de> wrote:
> 
> So you should call repair which rebalances (i.e. allocates additional space) 
> BlueFS space. Hence allowing OSD to start.
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
>> Not exactly. The rebalancing from this kv_sync_thread still might be 
>> deferred due to the nature of this thread (haven't 100% sure though).
>> 
>> Here is my PR showing the idea (still untested and perhaps unfinished!!!)
>> 
>> https://github.com/ceph/ceph/pull/24353
>> 
>> 
>> Igor
>> 
>> 
>> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
>>> Can you please confirm whether I got this right:
>>> 
>>> --- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
>>> +++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
>>> @@ -9049,22 +9049,17 @@
>>>         throttle_bytes.put(costs);
>>>           PExtentVector bluefs_gift_extents;
>>> -      if (bluefs &&
>>> -      after_flush - bluefs_last_balance >
>>> -      cct->_conf->bluestore_bluefs_balance_interval) {
>>> -    bluefs_last_balance = after_flush;
>>> -    int r = _balance_bluefs_freespace(&bluefs_gift_extents);
>>> -    assert(r >= 0);
>>> -    if (r > 0) {
>>> -      for (auto& p : bluefs_gift_extents) {
>>> -        bluefs_extents.insert(p.offset, p.length);
>>> -      }
>>> -      bufferlist bl;
>>> -      encode(bluefs_extents, bl);
>>> -      dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> -           << bluefs_extents << std::dec << dendl;
>>> -      synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> +      int r = _balance_bluefs_freespace(&bluefs_gift_extents);
>>> +      ceph_assert(r >= 0);
>>> +      if (r > 0) {
>>> +    for (auto& p : bluefs_gift_extents) {
>>> +      bluefs_extents.insert(p.offset, p.length);
>>>       }
>>> +    bufferlist bl;
>>> +    encode(bluefs_extents, bl);
>>> +    dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> +         << bluefs_extents << std::dec << dendl;
>>> +    synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>>         }
>>>           // cleanup sync deferred keys
>>> 
>>>> On 1.10.2018, at 18:39, Igor Fedotov <ifedo...@suse.de> wrote:
>>>> 
>>>> So you have just a single main device per OSD....
>>>> 
>>>> Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition 
>>>> at main device, standalone devices are supported only.
>>>> 
>>>> Given that you're able to rebuild the code I can suggest to make a patch 
>>>> that triggers BlueFS rebalance (see code snippet below) on repairing.
>>>>      PExtentVector bluefs_gift_extents;
>>>>      int r = _balance_bluefs_freespace(&bluefs_gift_extents);
>>>>      ceph_assert(r >= 0);
>>>>      if (r > 0) {
>>>>        for (auto& p : bluefs_gift_extents) {
>>>>          bluefs_extents.insert(p.offset, p.length);
>>>>        }
>>>>        bufferlist bl;
>>>>        encode(bluefs_extents, bl);
>>>>        dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>>>             << bluefs_extents << std::dec << dendl;
>>>>        synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>>>      }
>>>> 
>>>> If it waits I can probably make a corresponding PR tomorrow.
>>>> 
>>>> Thanks,
>>>> Igor
>>>> On 10/1/2018 6:16 PM, Sergey Malinin wrote:
>>>>> I have rebuilt the tool, but none of my OSDs no matter dead or alive have 
>>>>> any symlinks other than 'block' pointing to LVM.
>>>>> I adjusted main device size but it looks like it needs even more space 
>>>>> for db compaction. After executing bluefs-bdev-expand OSD fails to start, 
>>>>> however 'fsck' and 'repair' commands finished successfully.
>>>>> 
>>>>> 2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
>>>>> 2018-10-01 18:02:39.763 7fc9226c6240  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening allocation 
>>>>> metadata
>>>>> 2018-10-01 18:02:40.907 7fc9226c6240  1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 285 GiB in 2249899 
>>>>> extents
>>>>> 2018-10-01 18:02:40.951 7fc9226c6240 -1 
>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _reconcile_bluefs_freespace bluefs 
>>>>> extra 0x[6d6f000000~50c800000]
>>>>> 2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 
>>>>> shutdown
>>>>> 2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
>>>>> 2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling 
>>>>> all background work
>>>>> 2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
>>>>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
>>>>> 2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
>>>>> 2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 
>>>>> shutdown
>>>>> 2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
>>>>> /var/lib/ceph/osd/ceph-1/block) close
>>>>> 2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
>>>>> /var/lib/ceph/osd/ceph-1/block) close
>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
>>>>> object store
>>>>> 2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: (5) 
>>>>> Input/output error
>>>>> 
>>>>> 
>>>>>> On 1.10.2018, at 18:09, Igor Fedotov <ifedo...@suse.de> wrote:
>>>>>> 
>>>>>> Well, actually you can avoid bluestore-tool rebuild.
>>>>>> 
>>>>>> You'll need to edit the first chunk of blocks.db where labels are 
>>>>>> stored. (Please make a backup first!!!)
>>>>>> 
>>>>>> Size label is stored at offset 0x52 and is 8 bytes long - little-endian 
>>>>>> 64bit integer encoding. (Please verify that old value at this offset 
>>>>>> exactly corresponds to you original volume size and/or 'size' label 
>>>>>> reported by ceph-bluestore-tool).
>>>>>> 
>>>>>> So you have to put new DB volume size there. Or you can send the first 
>>>>>> 4K chunk (e.g. extracted with dd) along with new DB volume size (in 
>>>>>> bytes) to me and I'll do that for you.
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Igor
>>>>>> 
>>>>>> 
>>>>>> On 10/1/2018 5:32 PM, Igor Fedotov wrote:
>>>>>>> On 10/1/2018 5:03 PM, Sergey Malinin wrote:
>>>>>>>> Before I received your response, I had already added 20GB to the OSD 
>>>>>>>> (by epanding LV followed by bluefs-bdev-expand) and ran 
>>>>>>>> "ceph-kvstore-tool bluestore-kv <path> compact", however it still 
>>>>>>>> needs more space.
>>>>>>>> Is that because I didn't update DB size with set-label-key?
>>>>>>> In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" 
>>>>>>> command to commit bluefs volume expansion.
>>>>>>> Unfortunately the last command doesn't handle "size" label properly. 
>>>>>>> That's why you might need to backport and rebuild with the mentioned 
>>>>>>> commits.
>>>>>>> 
>>>>>>>> What exactly is the label-key that needs to be updated, as I couldn't 
>>>>>>>> find which one is related to DB:
>>>>>>>> 
>>>>>>>> # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
>>>>>>>> inferring bluefs devices from bluestore path
>>>>>>>> {
>>>>>>>>       "/var/lib/ceph/osd/ceph-1/block": {
>>>>>>>>           "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
>>>>>>>>           "size": 471305551872,
>>>>>>>>           "btime": "2018-07-31 03:06:43.751243",
>>>>>>>>           "description": "main",
>>>>>>>>           "bluefs": "1",
>>>>>>>>           "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
>>>>>>>>           "kv_backend": "rocksdb",
>>>>>>>>           "magic": "ceph osd volume v026",
>>>>>>>>           "mkfs_done": "yes",
>>>>>>>>           "osd_key": "XXX",
>>>>>>>>           "ready": "ready",
>>>>>>>>           "whoami": "1"
>>>>>>>>       }
>>>>>>>> }
>>>>>>> 'size' label but your output is for block(aka slow) device.
>>>>>>> 
>>>>>>> It should return labels for db/wal devices as well (block.db and 
>>>>>>> block.wal symlinks respectively). It works for me in master, can't 
>>>>>>> verify with mimic at the moment though..
>>>>>>> Here is output for master:
>>>>>>> 
>>>>>>> # bin/ceph-bluestore-tool show-label --path dev/osd0
>>>>>>> inferring bluefs devices from bluestore path
>>>>>>> {
>>>>>>>      "dev/osd0/block": {
>>>>>>>          "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
>>>>>>>          "size": 21474836480,
>>>>>>>          "btime": "2018-09-10 15:55:09.044039",
>>>>>>>          "description": "main",
>>>>>>>          "bluefs": "1",
>>>>>>>          "ceph_fsid": "56eddc15-11b9-4e0b-9192-e391fbae551c",
>>>>>>>          "kv_backend": "rocksdb",
>>>>>>>          "magic": "ceph osd volume v026",
>>>>>>>          "mkfs_done": "yes",
>>>>>>>          "osd_key": "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==",
>>>>>>>          "ready": "ready",
>>>>>>>          "whoami": "0"
>>>>>>>      },
>>>>>>>      "dev/osd0/block.wal": {
>>>>>>>          "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
>>>>>>>          "size": 1048576000,
>>>>>>>          "btime": "2018-09-10 15:55:09.044985",
>>>>>>>          "description": "bluefs wal"
>>>>>>>      },
>>>>>>>      "dev/osd0/block.db": {
>>>>>>>          "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
>>>>>>>          "size": 1048576000,
>>>>>>>          "btime": "2018-09-10 15:55:09.044469",
>>>>>>>          "description": "bluefs db"
>>>>>>>      }
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> You can try --dev option instead of --path, e.g.
>>>>>>> ceph-bluestore-tool show-label --dev <path-to-block.db>
>>>>>>> 
>>>>>>> 
>>>>>>>>> On 1.10.2018, at 16:48, Igor Fedotov <ifedo...@suse.de> wrote:
>>>>>>>>> 
>>>>>>>>> This looks like a sort of deadlock when BlueFS needs some additional 
>>>>>>>>> space to replay the log left after the crash. Which happens during 
>>>>>>>>> BlueFS open.
>>>>>>>>> 
>>>>>>>>> But such a space (at slow device as DB is full) is gifted in 
>>>>>>>>> background during bluefs rebalance procedure which will occur after 
>>>>>>>>> the open.
>>>>>>>>> 
>>>>>>>>> Hence OSDs stuck in permanent crashing..
>>>>>>>>> 
>>>>>>>>> The only way to recover I can suggest for now is to expand DB 
>>>>>>>>> volumes. You can do that with lvm tools if you have any spare space 
>>>>>>>>> for that.
>>>>>>>>> 
>>>>>>>>> Once resized you'll need ceph-bluestore-tool to indicate volume 
>>>>>>>>> expansion to BlueFS (bluefs-bdev-expand command ) and finally update 
>>>>>>>>> DB volume size label with  set-label-key command.
>>>>>>>>> 
>>>>>>>>> The latter is a bit tricky for mimic - you might need to backport 
>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b
>>>>>>>>> 
>>>>>>>>> and rebuild ceph-bluestore-tool. Alternatively you can backport 
>>>>>>>>> https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b
>>>>>>>>> 
>>>>>>>>> then bluefs expansion and label updates will occur in a single step.
>>>>>>>>> 
>>>>>>>>> I'll do these backports in upstream but this will take some time to 
>>>>>>>>> pass all the procedures and get into official mimic release.
>>>>>>>>> 
>>>>>>>>> Will fire a ticket to fix the original issue as well.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Igor
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 10/1/2018 3:28 PM, Sergey Malinin wrote:
>>>>>>>>>> These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm 
>>>>>>>>>> prepare --bluestore /dev/nvme0n1p3" i.e. without specifying wal/db 
>>>>>>>>>> devices.
>>>>>>>>>> OSDs were created with bluestore_min_alloc_size_ssd=4096, another 
>>>>>>>>>> modified setting is bluestore_cache_kv_max=1073741824
>>>>>>>>>> 
>>>>>>>>>> DB/block usage collected by prometheus module for 3 failed and 1 
>>>>>>>>>> survived OSDs:
>>>>>>>>>> 
>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> 
>>>>>>>>>> this one has survived
>>>>>>>>>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0
>>>>>>>>>> 
>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
>>>>>>>>>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0
>>>>>>>>>> 
>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0
>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0
>>>>>>>>>> ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0
>>>>>>>>>> 
>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0
>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0
>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0
>>>>>>>>>> ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> First crashed OSD was doing DB compaction, others crashed shortly 
>>>>>>>>>> after during backfilling. Workload was "ceph-data-scan scan_inodes" 
>>>>>>>>>> filling metadata pool located on these OSDs at the rate close to 10k 
>>>>>>>>>> objects/second.
>>>>>>>>>> Here is the log excerpt of the first crash occurrence:
>>>>>>>>>> 
>>>>>>>>>> 2018-10-01 03:27:12.762 7fbf16dd6700  0 
>>>>>>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _balance_bluefs_freespace no 
>>>>>>>>>> allocate on 0x80000000 min_alloc_size 0x1000
>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: 
>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] 
>>>>>>>>>> [JOB 24] Generated table #89741: 106356 keys, 68110589 bytes
>>>>>>>>>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 
>>>>>>>>>> {"time_micros": 1538353632892744, "cf_name": "default", "job": 24, 
>>>>>>>>>> "event": "table_file_creation", "file_number": 89741, "file_size": 
>>>>>>>>>> 68110589, "table_properties": {"data_size": 67112903, "index_size": 
>>>>>>>>>> 579319, "filter_size": 417316, "raw_key_size": 6733561, 
>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": 60994583, 
>>>>>>>>>> "raw_average_value_size": 573, "num_data_blocks": 16336, 
>>>>>>>>>> "num_entries": 106356, "filter_policy_name": 
>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "14444", 
>>>>>>>>>> "kMergeOperands": "0"}}
>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: 
>>>>>>>>>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] 
>>>>>>>>>> [JOB 24] Generated table #89742: 23214 keys, 16352315 bytes
>>>>>>>>>> 2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 
>>>>>>>>>> {"time_micros": 1538353632938670, "cf_name": "default", "job": 24, 
>>>>>>>>>> "event": "table_file_creation", "file_number": 89742, "file_size": 
>>>>>>>>>> 16352315, "table_properties": {"data_size": 16116986, "index_size": 
>>>>>>>>>> 139894, "filter_size": 94386, "raw_key_size": 1470883, 
>>>>>>>>>> "raw_average_key_size": 63, "raw_value_size": 14775006, 
>>>>>>>>>> "raw_average_value_size": 636, "num_data_blocks": 3928, 
>>>>>>>>>> "num_entries": 23214, "filter_policy_name": 
>>>>>>>>>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "90", 
>>>>>>>>>> "kMergeOperands": "0"}}
>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700  1 bluefs _allocate failed to 
>>>>>>>>>> allocate 0x4100000 on bdev 1, free 0x1a00000; fallback to bdev 2
>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _allocate failed to 
>>>>>>>>>> allocate 0x4100000 on bdev 2, dne
>>>>>>>>>> 2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _flush_range 
>>>>>>>>>> allocated: 0x0 offset: 0x0 length: 0x40ea9f1
>>>>>>>>>> 2018-10-01 03:27:13.046 7fbf1e5e5700 -1 
>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int 
>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' 
>>>>>>>>>> thread 7fbf1e5e5700 time 2018-10-01 03:27:13.048298
>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 
>>>>>>>>>> == "bluefs enospc")
>>>>>>>>>> 
>>>>>>>>>>    ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) 
>>>>>>>>>> mimic (stable)
>>>>>>>>>>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>>>>>>>>> const*)+0x102) [0x7fbf2d4fe5c2]
>>>>>>>>>>    2: (()+0x26c787) [0x7fbf2d4fe787]
>>>>>>>>>>    3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, 
>>>>>>>>>> unsigned long)+0x1ab4) [0x5619325114b4]
>>>>>>>>>>    4: (BlueRocksWritableFile::Flush()+0x3d) [0x561932527c1d]
>>>>>>>>>>    5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x56193271c399]
>>>>>>>>>>    6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x56193271d42b]
>>>>>>>>>>    7: 
>>>>>>>>>> (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status 
>>>>>>>>>> const&, rocksdb::CompactionJob::SubcompactionState*, 
>>>>>>>>>> rocksdb::RangeDelAggregator*, CompactionIterationStats*, 
>>>>>>>>>> rocksdb::Slice const*)+0x3db) [0x56193276098b]
>>>>>>>>>>    8: 
>>>>>>>>>> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9)
>>>>>>>>>>  [0x561932763da9]
>>>>>>>>>>    9: (rocksdb::CompactionJob::Run()+0x314) [0x561932765504]
>>>>>>>>>>    10: (rocksdb::DBImpl::BackgroundCompaction(bool*, 
>>>>>>>>>> rocksdb::JobContext*, rocksdb::LogBuffer*, 
>>>>>>>>>> rocksdb::DBImpl::PrepickedCompaction*)+0xc54) [0x5619325b5c44]
>>>>>>>>>>    11: 
>>>>>>>>>> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
>>>>>>>>>>  rocksdb::Env::Priority)+0x397) [0x5619325b8557]
>>>>>>>>>>    12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x97) 
>>>>>>>>>> [0x5619325b8cd7]
>>>>>>>>>>    13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned 
>>>>>>>>>> long)+0x266) [0x5619327a5e36]
>>>>>>>>>>    14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x47) 
>>>>>>>>>> [0x5619327a5fb7]
>>>>>>>>>>    15: (()+0xbe733) [0x7fbf2b500733]
>>>>>>>>>>    16: (()+0x76db) [0x7fbf2bbf86db]
>>>>>>>>>>    17: (clone()+0x3f) [0x7fbf2abbc88f]
>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
>>>>>>>>>> needed to interpret this.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 1.10.2018, at 15:01, Igor Fedotov <ifedo...@suse.de> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Sergey,
>>>>>>>>>>> 
>>>>>>>>>>> could you please provide more details on your OSDs ?
>>>>>>>>>>> 
>>>>>>>>>>> What are sizes for DB/block devices?
>>>>>>>>>>> 
>>>>>>>>>>> Do you have any modifications in BlueStore config settings?
>>>>>>>>>>> 
>>>>>>>>>>> Can you share stats you're referring to?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Igor
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 10/1/2018 12:29 PM, Sergey Malinin wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> 3 of 4 NVME OSDs crashed at the same time on assert(0 == "bluefs 
>>>>>>>>>>>> enospc") and no longer start.
>>>>>>>>>>>> Stats collected just before crash show that 
>>>>>>>>>>>> ceph_bluefs_db_used_bytes is 100% used. Although OSDs have over 
>>>>>>>>>>>> 50% of free space, it is not reallocated for DB usage.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240  1 bluefs _allocate failed to 
>>>>>>>>>>>> allocate 0x100000 on bdev 1, free 0x0; fallback to bdev 2
>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _allocate failed to 
>>>>>>>>>>>> allocate 0x100000 on bdev 2, dne
>>>>>>>>>>>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _flush_range 
>>>>>>>>>>>> allocated: 0x0 offset: 0x0 length: 0xa8700
>>>>>>>>>>>> 2018-10-01 12:18:06.748 7f1d6a04d240 -1 
>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int 
>>>>>>>>>>>> BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' 
>>>>>>>>>>>> thread 7f1d6a04d240 time 2018-10-01 12:18:06.746800
>>>>>>>>>>>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED 
>>>>>>>>>>>> assert(0 == "bluefs enospc")
>>>>>>>>>>>> 
>>>>>>>>>>>>    ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) 
>>>>>>>>>>>> mimic (stable)
>>>>>>>>>>>>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, 
>>>>>>>>>>>> char const*)+0x102) [0x7f1d6146f5c2]
>>>>>>>>>>>>    2: (()+0x26c787) [0x7f1d6146f787]
>>>>>>>>>>>>    3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, 
>>>>>>>>>>>> unsigned long)+0x1ab4) [0x5586b22684b4]
>>>>>>>>>>>>    4: (BlueRocksWritableFile::Flush()+0x3d) [0x5586b227ec1d]
>>>>>>>>>>>>    5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x5586b2473399]
>>>>>>>>>>>>    6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) 
>>>>>>>>>>>> [0x5586b247442b]
>>>>>>>>>>>>    7: (rocksdb::BuildTable(std::__cxx11::basic_string<char, 
>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > const&, 
>>>>>>>>>>>> rocksdb::Env*, rocksdb::ImmutableCFOptions const&, 
>>>>>>>>>>>> rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rock
>>>>>>>>>>>> sdb::TableCache*, rocksdb::InternalIterator*, 
>>>>>>>>>>>> std::unique_ptr<rocksdb::InternalIterator, 
>>>>>>>>>>>> std::default_delete<rocksdb::InternalIterator> >, 
>>>>>>>>>>>> rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, 
>>>>>>>>>>>> std::vector<std::unique_ptr<
>>>>>>>>>>>> rocksdb::IntTblPropCollectorFactory, 
>>>>>>>>>>>> std::default_delete<rocksdb::IntTblPropCollectorFactory> >, 
>>>>>>>>>>>> std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
>>>>>>>>>>>>  std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > co
>>>>>>>>>>>> nst*, unsigned int, std::__cxx11::basic_string<char, 
>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > const&, 
>>>>>>>>>>>> std::vector<unsigned long, std::allocator<unsigned long> >, 
>>>>>>>>>>>> unsigned long, rocksdb::SnapshotChecker*, rocksdb::Compression
>>>>>>>>>>>> Type, rocksdb::CompressionOptions const&, bool, 
>>>>>>>>>>>> rocksdb::InternalStats*, rocksdb::TableFileCreationReason, 
>>>>>>>>>>>> rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, 
>>>>>>>>>>>> rocksdb::TableProperties*, int, unsigned long, unsigned long, 
>>>>>>>>>>>> rocksdb
>>>>>>>>>>>> ::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94]
>>>>>>>>>>>>    8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, 
>>>>>>>>>>>> rocksdb::ColumnFamilyData*, rocksdb::MemTable*, 
>>>>>>>>>>>> rocksdb::VersionEdit*)+0xcb7) [0x5586b2321457]
>>>>>>>>>>>>    9: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, 
>>>>>>>>>>>> std::allocator<unsigned long> > const&, unsigned long*, 
>>>>>>>>>>>> bool)+0x19de) [0x5586b232373e]
>>>>>>>>>>>>    10: 
>>>>>>>>>>>> (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor,
>>>>>>>>>>>>  std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, 
>>>>>>>>>>>> bool, bool)+0x5d4) [0x5586b23242f4]
>>>>>>>>>>>>    11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, 
>>>>>>>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, 
>>>>>>>>>>>> std::allocator<char> > const&, 
>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, 
>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescri
>>>>>>>>>>>> ptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, 
>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, 
>>>>>>>>>>>> bool)+0x68b) [0x5586b232559b]
>>>>>>>>>>>>    12: (rocksdb::DB::Open(rocksdb::DBOptions const&, 
>>>>>>>>>>>> std::__cxx11::basic_string<char, std::char_traits<char>, 
>>>>>>>>>>>> std::allocator<char> > const&, 
>>>>>>>>>>>> std::vector<rocksdb::ColumnFamilyDescriptor, 
>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyDescriptor
>>>>>>>>>>>>>> const&, std::vector<rocksdb::ColumnFamilyHandle*, 
>>>>>>>>>>>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, 
>>>>>>>>>>>>>> rocksdb::DB**)+0x22) [0x5586b2326e72]
>>>>>>>>>>>>    13: (RocksDBStore::do_open(std::ostream&, bool, 
>>>>>>>>>>>> std::vector<KeyValueDB::ColumnFamily, 
>>>>>>>>>>>> std::allocator<KeyValueDB::ColumnFamily> > const*)+0x170c) 
>>>>>>>>>>>> [0x5586b220219c]
>>>>>>>>>>>>    14: (BlueStore::_open_db(bool, bool)+0xd8e) [0x5586b218ee1e]
>>>>>>>>>>>>    15: (BlueStore::_mount(bool, bool)+0x4b7) [0x5586b21bf807]
>>>>>>>>>>>>    16: (OSD::init()+0x295) [0x5586b1d673c5]
>>>>>>>>>>>>    17: (main()+0x268d) [0x5586b1c554ed]
>>>>>>>>>>>>    18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97]
>>>>>>>>>>>>    19: (_start()+0x2a) [0x5586b1d1d7fa]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` 
>>>>>>>>>>>> is needed to interpret this.
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to