Summary
----------
The relationship of the values configured for bluestore_min_alloc_size and 
bluefs_shared_alloc_size are reported to impact space amplification, partial 
overwrites in erasure coded pools, and storage capacity as an osd becomes more 
fragmented and/or more full.

Previous discussions including this topic
----------------------------------------
comment #7 in bug 63618 in Dec 2023 - 
https://tracker.ceph.com/issues/63618#note-7

pad writeup related to bug 62282 likely from late 2023 - 
https://pad.ceph.com/p/RCA_62282

email sent 13 Sept 2023 in mail list discussion of cannot create new osd - 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/5M4QAXJDCNJ74XVIBIFSHHNSETCCKNMC/

comment #9 in bug 58530 likely from early 2023 - 
https://tracker.ceph.com/issues/58530#note-9

email sent 30 Sept 2021 in mail list discussion of flapping osds - 
https://www.mail-archive.com/ceph-users@ceph.io/msg13072.html

email sent 25 Feb 2020 in mail list discussion of changing allocation size - 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B3DGKH6THFGHALLX6ATJ4GGD4SVFNEKU/


Current situation
-----------------
We have three Ceph clusters that were originally built via cephadm on octopus 
and later upgraded to pacific. All osds are HDD (will be moving to wal+db on 
SSD) and were resharded after the upgrade to enable rocksdb sharding. 

The value for bluefs_shared_alloc_size has remained unchanged at 65535. 

The value for bluestore_min_alloc_size_hdd was 65535 in octopus but is reported 
as 4096 by ceph daemon osd.<id> config show in pacific. However, the osd label 
after upgrading to pacific retains the value of 65535 for bfm_bytes_per_block. 
BitmapFreelistManager.h in Ceph source code 
(src/os/bluestore/BitmapFreelistManager.h) indicates that bytes_per_block is 
bdev_block_size.  This indicates that the physical layout of the osd has not 
changed from 65535 despite the return of the ceph dameon command reporting it 
as 4096. This interpretation is supported by the Minimum Allocation Size part 
of the Bluestore configuration reference for quincy 
(https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#minimum-allocation-size)


Questions
----------
What are the pros and cons of the following three cases with two variations per 
case - when using co-located wal+db on HDD and when using separate wal+db on 
SSD:
1) bluefs_shared_alloc_size, bluestore_min_alloc_size, and bfm_bytes_per_block 
all equal
2) bluefs_shared_alloc_size greater than but a multiple of 
bluestore_min_alloc_size with bfm_bytes_per_block equal to 
bluestore_min_alloc_size
3) bluefs_shared_alloc_size greater than but a multiple of 
bluestore_min_alloc_size with bfm_bytes_per_block equal to 
bluefs_shared_alloc_size
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to