We changed these settings. Our config now is: bluestore_rocksdb_options = "compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=50331648,target_file_size_base=50331648,max_background_compactions=31,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=603979776,max_bytes_for_level_multiplier=10,compaction_threads=32,flusher_threads=8"
It could be changed without redeploy. It changes the sst files, when compaction is triggered. The additional improvement is Snappy compression. We rebuild ceph with support for it. I can create PR with it, if you want :) Best Regards, Rafał Wądołowski Cloud & Security Engineer On 25.06.2019 22:16, Christian Wuerdig wrote: > The sizes are determined by rocksdb settings - some details can be > found here: https://tracker.ceph.com/issues/24361 > One thing to note, in this thread > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html > it's noted that rocksdb could use up to 100% extra space during > compaction so if you want to avoid spill over during compaction then > safer values would be 6/60/600 GB > > You can change max_bytes_for_level_base and > max_bytes_for_level_multiplier to suit your needs better but I'm not > sure if that can be changed on the fly or if you have to re-create > OSDs in order to make them apply > > On Tue, 25 Jun 2019 at 18:06, Rafał Wądołowski > <rwadolow...@cloudferro.com <mailto:rwadolow...@cloudferro.com>> wrote: > > Why are you selected this specific sizes? Are there any > tests/research on it? > > > Best Regards, > > Rafał Wądołowski > > On 24.06.2019 13:05, Konstantin Shalygin wrote: >> >>> Hi >>> >>> Have been thinking a bit about rocksdb and EC pools: >>> >>> Since a RADOS object written to a EC(k+m) pool is split into several >>> minor pieces, then the OSD will receive many more smaller objects, >>> compared to the amount it would receive in a replicated setup. >>> >>> This must mean that the rocksdb will also need to handle this more >>> entries, and will grow faster. This will have an impact when using >>> bluestore for slow HDD with DB on SSD drives, where the faster growing >>> rocksdb might result in spillover to slow store - if not taken into >>> consideration when designing the disk layout. >>> >>> Are my thoughts on the right track or am I missing something? >>> >>> Has somebody done any measurement on rocksdb growth, comparing replica >>> vs EC ? >> >> If you want to be not affected on spillover of block.db - use >> 3/30/300 GB partition for your block.db. >> >> >> >> k >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com