Re: [ceph-users] Thoughts on rocksdb and erasurecode

Rafał Wądołowski Wed, 26 Jun 2019 02:51:36 -0700

We changed these settings. Our config now is:

bluestore_rocksdb_options =
"compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=50331648,target_file_size_base=50331648,max_background_compactions=31,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=603979776,max_bytes_for_level_multiplier=10,compaction_threads=32,flusher_threads=8"


It could be changed without redeploy. It changes the sst files, when
compaction is triggered. The additional improvement is Snappy
compression. We rebuild ceph with support for it. I can create PR with
it, if you want :)


Best Regards,

Rafał Wądołowski
Cloud & Security Engineer

On 25.06.2019 22:16, Christian Wuerdig wrote:
> The sizes are determined by rocksdb settings - some details can be
> found here: https://tracker.ceph.com/issues/24361
> One thing to note, in this thread
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html
> it's noted that rocksdb could use up to 100% extra space during
> compaction so if you want to avoid spill over during compaction then
> safer values would be 6/60/600 GB
>
> You can change max_bytes_for_level_base and
> max_bytes_for_level_multiplier to suit your needs better but I'm not
> sure if that can be changed on the fly or if you have to re-create
> OSDs in order to make them apply
>
> On Tue, 25 Jun 2019 at 18:06, Rafał Wądołowski
> <rwadolow...@cloudferro.com <mailto:rwadolow...@cloudferro.com>> wrote:
>
>     Why are you selected this specific sizes? Are there any
>     tests/research on it?
>
>
>     Best Regards,
>
>     Rafał Wądołowski
>
>     On 24.06.2019 13:05, Konstantin Shalygin wrote:
>>
>>>     Hi
>>>
>>>     Have been thinking a bit about rocksdb and EC pools:
>>>
>>>     Since a RADOS object written to a EC(k+m) pool is split into several 
>>>     minor pieces, then the OSD will receive many more smaller objects, 
>>>     compared to the amount it would receive in a replicated setup.
>>>
>>>     This must mean that the rocksdb will also need to handle this more 
>>>     entries, and will grow faster. This will have an impact when using 
>>>     bluestore for slow HDD with DB on SSD drives, where the faster growing 
>>>     rocksdb might result in spillover to slow store - if not taken into 
>>>     consideration when designing the disk layout.
>>>
>>>     Are my thoughts on the right track or am I missing something?
>>>
>>>     Has somebody done any measurement on rocksdb growth, comparing replica 
>>>     vs EC ?
>>
>>     If you want to be not affected on spillover of block.db - use
>>     3/30/300 GB partition for your block.db.
>>
>>
>>
>>     k
>>
>>
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Thoughts on rocksdb and erasurecode

Reply via email to