Hi,

you are probably running into the bluestore min alloc size which is 64kb on
HDDs and 16kb on SSDs. With k=5,m=2 you'd need at least 320kb objects on
HDDs or 80kb objects on SSD to use the space efficiently.
Last time I checked these values were fixed on OSD creation and cannot be
changed after creation.

It's not necessarily the best idea to store a lot of very small objects in
RADOS (or cephfs or rgw), but it really depends on your exact requirements
and access pattern.


Paul


2018-06-27 11:32 GMT+02:00 Nicolas Dandrimont <ol...@softwareheritage.org>:

> Hi,
>
> I would like to use ceph to store a lot of small objects. Our current usage
> pattern is 4.5 billion unique objects, ranging from 0 to 100MB, with a
> median
> size of 3-4kB. Overall, that's around 350 TB of raw data to store, which
> isn't
> much, but that's across a *lot* of tiny files.
>
> We expect a growth pattern of around at third per year, and the object size
> distribution to sensibly stay the same (it's been stable for the past three
> years, and we don't see that changing).
>
> Our object access pattern is a very simple key -> value store, where the
> key
> happens to be the sha1 of the content we're storing. Any metadata are
> stored
> externally and we really only need a dumb object storage.
>
> Our redundancy requirement is to be able to withstand the loss of 2 OSDs.
>
> After looking at our options for storage in Ceph, I dismissed (perhaps
> hastily)
> RGW for its metadata overhead, and went straight to plain RADOS. I've
> setup an
> erasure coded storage pool, with default settings, with k=5 and m=2
> (expecting
> a 40% increase in storage use over plain contents).
>
> After storing objects in the pool, I see a storage usage of 700% instead of
> 140%. My understanding of the erasure code profile docs[1] is that objects
> that
> are below the stripe width (k * stripe_unit, which in my case is 20KB)
> can't be
> chunked for erasure coding, which makes RADOS fall back to plain object
> copying, with k+m copies.
>
> [1] http://docs.ceph.com/docs/master/rados/operations/
> erasure-code-profile/
>
> Is my understanding correct? Does anyone have experience with this kind of
> storage workload in Ceph?
>
> If my understanding is correct, I'll end up adding size tiering on my
> object
> storage layer, shuffling objects in two pools with different settings
> according
> to their size. That's not too bad, but I'd like to make sure I'm not
> completely
> misunderstanding something.
>
> Thanks!
> --
> Nicolas Dandrimont
> Backend Engineer, Software Heritage
>
> BOFH excuse #170:
> popper unable to process jumbo kernel
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to