Yes, it improves the dynamic where only ~3, 30, 300, etc. GB of DB space can be used, and thus mitigates spillover. Previously a, say, 29GB DB device/partition would be like 85% unused.
With recent releases one can also turn on DB compression, which should have a similar benefit. > On Nov 12, 2024, at 11:25 AM, Frédéric Nass <frederic.n...@univ-lorraine.fr> > wrote: > > Hi Anthony, > > Did the RocksDB sharding end up improving the overspilling situation related > to the level thresholds? I had only anticipated that it would reduce the > impact of compaction. > > We reshared our OSD's RocksDBs a long time ago (after upgrading to Pacific > IIRC) and I think we could still observe overspilling at the layer levels > sometimes, if I'm not mistaken. > > Cheers, > Frédéric. > > PS: It seems that the document you referred to is not accessible from the > Internet. > > ----- Le 12 Nov 24, à 15:11, Anthony D'Atri <anthony.da...@gmail.com> a écrit > : > RocksDB column sharding came a while ago. It should be enabled on your OSDs, > provided they weren’t built on a much older release. If they were you can > update them. >  > rocksdb_in_ceph > PDF Document · 512 KB > <https://cf2.cloudferro.com:8080/swift/v1/AUTH_5e376cddf8a94f9294259b5f48d7b2cd/ceph/rocksdb_in_ceph.pdf> > > > IBM Storage Ceph – Administration, Resharding RocksDB database reshard > RocksDB database > ibm.com > <https://www.ibm.com/docs/en/storage-ceph/7.1?topic=bluestore-resharding-rocksdb-database> > > <https://www.ibm.com/docs/en/storage-ceph/7.1?topic=bluestore-resharding-rocksdb-database> > > > > > On Nov 12, 2024, at 8:02 AM, Alexander Patrakov <patra...@gmail.com> wrote: > > Yes, that is correct. > > On Tue, Nov 12, 2024 at 8:51 PM Frédéric Nass > <frederic.n...@univ-lorraine.fr> wrote: > > Hello Alexander, > > Thank you for clarifying this point. The documentation was not very clear > about the 'improvements'. > > Does that mean that in the latest releases overspilling no longer occurs > between the two thresholds of 30GB and 300GB? Meaning block.db can be 80GB in > size without overspilling, for example? > > Cheers, > Frédéric. > > ----- Le 12 Nov 24, à 13:32, Alexander Patrakov patra...@gmail.com a écrit : > > Hello Frédéric, > > The advice regarding 30/300 GB DB sizes is no longer valid. Since Ceph > 15.2.8, due to the new default (bluestore_volume_selection_policy = > use_some_extra), it no longer wastes the extra capacity of the DB > device. > > On Tue, Nov 12, 2024 at 5:52 PM Frédéric Nass > <frederic.n...@univ-lorraine.fr> wrote: > > > > ----- Le 12 Nov 24, à 8:51, Roland Giesler rol...@giesler.za.net a écrit : > > On 2024/11/12 04:54, Alwin Antreich wrote: > Hi Roland, > > On Mon, Nov 11, 2024, 20:16 Roland Giesler <rol...@giesler.za.net> wrote: > > I have ceph 17.2.6 on a proxmox cluster and want to replace some ssd's > who are end of life. I have some spinners who have their journals on > SSD. Each spinner has a 50GB SSD LVM partition and I want to move those > each to new corresponding partitions. > > The new 4TB SSD's I have split into volumes with: > > # lvcreate -n NodeA-nvme-LV-RocksDB1 -L 47.69g NodeA-nvme0 > # lvcreate -n NodeA-nvme-LV-RocksDB2 -L 47.69g NodeA-nvme0 > # lvcreate -n NodeA-nvme-LV-RocksDB3 -L 47.69g NodeA-nvme0 > # lvcreate -n NodeA-nvme-LV-RocksDB4 -L 47.69g NodeA-nvme0 > # lvcreate -n NodeA-nvme-LV-data -l 100%FREE NodeA-nvme1 > # lvcreate -n NodeA-nvme-LV-data -l 100%FREE NodeA-nvme0 > > I caution the mix of DB/WAL partitions with other applications. The > performance profile may not be suited for shared use. And depending on the > use case the ~48GB might not be big enough to hinder DB spillover. See the > current size when querying the OSD. > > I see relatively small RocksDB and not WAL? > > ceph daemon osd.4 perf dump > <snip> > "bluefs": { > "db_total_bytes": 45025845248, > "db_used_bytes": 2131755008, > "wal_total_bytes": 0, > "wal_used_bytes": 0, > </snip> > > I have been led to understand that 4% is die high end and only on very busy > systems is that reached, if ever? > > Hi Roland, > > This is generally true but it depends on what your cluster is used for. > > If your cluster is used for block (RBD) storage then 1%-2% should be enough. > If > your cluster is used for file (cephfs) and S3 (RGW) storage then you'd rather > stay on the safe size and respect the 4% recommendation as these workloads > make > heavy use of block.db to store metadata. > > Now percentage is one thing, level size is another. To avoid overspilling when > block.db size approaches 30GB you'd better choose a block.db size of 300GB+ > whatever the percentage of block size this is, if you don't want to play with > rocksdb level size and multiplier, which you probably don't. > > Regards, > Frédéric. > > [1] > https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing > [2] > https://www.ibm.com/docs/en/storage-ceph/7.1?topic=bluestore-sizing-considerations > [3] https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide > > > What am I missing to get these changes to be permanent? > > Likely just an issue with the order of execution. But there is an easier > way to do the move. See: > https://docs.ceph.com/en/quincy/ceph-volume/lvm/migrate/ > > Ah, excellent! I didn't find that in my searches. Will try that now. > > regards > > Roland > > > > Cheers, > Alwin > > -- > > Alwin Antreich > Head of Training and Proxmox Services > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges, Andy Muthmann - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io/ > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > Alexander Patrakov > > > > -- > Alexander Patrakov > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io