Hi, I can check if this would change anything, but we are currently trying to find a different solution. The issue we ran into by using rados as backend with a bluestore osd was that every object seems to be cached in the osd and the memory consumption of the osd was increasing very much. This is not very useful for us, since the objects need to be accessed rarely and have a very long period of time to exist. So we are checking out now rdb with a database on top or a filesystem on top, which will handle the huge amount of small objects. This will have the drawback that a filesystem or a database could become inconsistent easier than a rados-only approach. Even cephfs was not the right approach since the space consumption would be the same as with rados directly.
Thanks to everybody, Marcus Haarmann Von: "Pavel Shub" <pa...@citymaps.com> An: "Gregory Farnum" <gfar...@redhat.com> CC: "Wido den Hollander" <w...@42on.com>, "ceph-users" <ceph-users@lists.ceph.com>, "Marcus Haarmann" <marcus.haarm...@midoco.de> Gesendet: Dienstag, 8. August 2017 17:50:44 Betreff: Re: [ceph-users] CEPH bluestore space consumption with small objects Marcus, You may want to look at the bluestore_min_alloc_size setting as well as the respective bluestore_min_alloc_size_ssd and bluestore_min_alloc_size_hdd. By default bluestore sets a 64k block size for ssds. I'm also using ceph for small objects and I've see my OSD usage go down from 80% to 20% after setting the min alloc size to 4k. Thanks, Pavel On Thu, Aug 3, 2017 at 3:59 PM, Gregory Farnum <gfar...@redhat.com> wrote: > Don't forget that at those sizes the internal journals and rocksdb size > tunings are likely to be a significant fixed cost. > > On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander <w...@42on.com> wrote: >> >> >> > Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann >> > <marcus.haarm...@midoco.de>: >> > >> > >> > Hi, >> > we are doing some tests here with a Kraken setup using bluestore backend >> > (on Ubuntu 64 bit). >> > We are trying to store > 10 mio very small objects using RADOS. >> > (no fs, no rdb, only osd and monitors) >> > >> > The setup was done with ceph-deploy, using the standard bluestore >> > option, no separate devices >> > for wal. The test cluster spreads over 3 virtual machines, each with >> > 100GB storage für osd. >> > >> > We are now in the following situation (used pool is "test"): >> > rados df >> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED >> > RD_OPS RD WR_OPS WR >> > rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k >> > test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M >> > >> > total_objects 595429 >> > total_used 141G >> > total_avail 158G >> > total_space 299G >> > >> > ceph osd df >> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> > 0 0.09760 1.00000 102298M 50763M 51535M 49.62 1.00 72 >> > 1 0.09760 1.00000 102298M 50799M 51499M 49.66 1.00 72 >> > 2 0.09760 1.00000 102298M 50814M 51484M 49.67 1.00 72 >> > TOTAL 299G 148G 150G 49.65 >> > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 >> > >> > As you can see, there are about 18GB data stored in ~595000 objects now. >> > The actual space consumption is about 150GB, which fills about half of >> > the storage. >> > >> >> Not really. Each OSD uses 50GB, but since you replicate 3 times (default) >> it's storing 150GB spread out over 3 OSDs. >> >> So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a >> lot, but a lot less then 150GB. >> >> > Objects have been added with a test script using the rados command line >> > (put). >> > >> > Obviously, the stored objects are counted byte by byte in the rados df >> > command, >> > but the real space allocation is about factor 8. >> > >> >> As written above, it's ~2.5x, not 8x. >> >> > The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. >> > >> > Is there any recommended way to configure bluestore with a better >> > suitable >> > block size for those small objects ? I cannot find any configuration >> > option >> > which would allow modification of the internal block handling of >> > bluestore. >> > Is luminous an option which allows more specific configuration ? >> > >> >> Could you try this with the Luminous RC as well? I don't know the answer >> here, but since Kraken a LOT has been improved to BlueStore. >> >> Wido >> >> > Thank you all in advance for support. >> > >> > Marcus Haarmann >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com