Hi, you are probably running into the bluestore min alloc size which is 64kb on HDDs and 16kb on SSDs. With k=5,m=2 you'd need at least 320kb objects on HDDs or 80kb objects on SSD to use the space efficiently. Last time I checked these values were fixed on OSD creation and cannot be changed after creation.
It's not necessarily the best idea to store a lot of very small objects in RADOS (or cephfs or rgw), but it really depends on your exact requirements and access pattern. Paul 2018-06-27 11:32 GMT+02:00 Nicolas Dandrimont <ol...@softwareheritage.org>: > Hi, > > I would like to use ceph to store a lot of small objects. Our current usage > pattern is 4.5 billion unique objects, ranging from 0 to 100MB, with a > median > size of 3-4kB. Overall, that's around 350 TB of raw data to store, which > isn't > much, but that's across a *lot* of tiny files. > > We expect a growth pattern of around at third per year, and the object size > distribution to sensibly stay the same (it's been stable for the past three > years, and we don't see that changing). > > Our object access pattern is a very simple key -> value store, where the > key > happens to be the sha1 of the content we're storing. Any metadata are > stored > externally and we really only need a dumb object storage. > > Our redundancy requirement is to be able to withstand the loss of 2 OSDs. > > After looking at our options for storage in Ceph, I dismissed (perhaps > hastily) > RGW for its metadata overhead, and went straight to plain RADOS. I've > setup an > erasure coded storage pool, with default settings, with k=5 and m=2 > (expecting > a 40% increase in storage use over plain contents). > > After storing objects in the pool, I see a storage usage of 700% instead of > 140%. My understanding of the erasure code profile docs[1] is that objects > that > are below the stripe width (k * stripe_unit, which in my case is 20KB) > can't be > chunked for erasure coding, which makes RADOS fall back to plain object > copying, with k+m copies. > > [1] http://docs.ceph.com/docs/master/rados/operations/ > erasure-code-profile/ > > Is my understanding correct? Does anyone have experience with this kind of > storage workload in Ceph? > > If my understanding is correct, I'll end up adding size tiering on my > object > storage layer, shuffling objects in two pools with different settings > according > to their size. That's not too bad, but I'd like to make sure I'm not > completely > misunderstanding something. > > Thanks! > -- > Nicolas Dandrimont > Backend Engineer, Software Heritage > > BOFH excuse #170: > popper unable to process jumbo kernel > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com