Greg, Paul, Thank you for the feedback. This has been very enlightening. One last question (for now at least). Are there any expected performance impacts from having I/O to multiple pools from the same client? (Given how RGW and CephFS store metadata, I would hope not, but I thought I'd ask.) Based on everything that has been described it makes sense to have metadata heavy objects (i.e., objects with a large fraction of kv data) to be in a replicated pool while putting the large blobs in an EC pool.
Thanks again, Ben On Wed, Sep 12, 2018 at 1:05 PM Gregory Farnum <gfar...@redhat.com> wrote: > On Tue, Sep 11, 2018 at 5:32 PM Benjamin Cherian < > benjamin.cher...@gmail.com> wrote: > >> Ok, that’s good to know. I was planning on using an EC pool. Maybe I'll >> store some of the larger kv pairs in their own objects or move the metadata >> into it's own replicated pool entirely. If the storage mechanism is the >> same, is there a reason xattrs are supported and omap is not? (Or is there >> some hidden cost to storing kv pairs in an EC pool I’m unaware of, e.g., >> does the kv data get replicated across all OSDs being used for a PG or >> something?) >> > > Yeah, if you're on an EC pool there isn't a good way to erasure-code > key-value data. So we willingly replicate xattrs across all the nodes > (since they are presumed to be small and limited in number — I think we > actually have total limits, but not sure?) but don't support omap at all > (as it's presumed to be a lot of data). > > Do note that if small objects are a large proportion of your data you > might prefer to put them in a replicated pool — in an EC pool you'd need > very small chunk sizes to get any non-replication happening anyway, and for > something in the 10KB range at a reasonable k+m you'd be dominated by > metadata size anyway. > -Greg > > >> >> Thanks, >> Ben >> >> On Tue, Sep 11, 2018 at 1:46 PM Patrick Donnelly <pdonn...@redhat.com> >> wrote: >> >>> On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian >>> <benjamin.cher...@gmail.com> wrote: >>> > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum <gfar...@redhat.com> >>> wrote: >>> >> >>> >> <snip> >>> >> In general, if the key-value storage is of unpredictable or >>> non-trivial >>> >> size, you should use omap. >>> >> >>> >> At the bottom layer where the data is actually stored, they're likely >>> to >>> >> be in the same places (if using BlueStore, they are the same — in >>> FileStore, >>> >> a rados xattr *might* be in the local FS xattrs, or it might not). It >>> is >>> >> somewhat more likely that something stored in an xattr will get >>> pulled into >>> >> memory at the same time as the object's internal metadata, but that >>> only >>> >> happens if it's quite small (think the xfs or ext4 xattr rules). >>> > >>> > >>> > Based on this description, if I'm planning on using Bluestore, there >>> is no >>> > particular reason to ever prefer using xattrs over omap (outside of >>> ease of >>> > use in the API), correct? >>> >>> You may prefer xattrs on bluestore if the metadata is small and you >>> may need to store the xattrs on an EC pool. omap is not supported on >>> ecpools. >>> >>> -- >>> Patrick Donnelly >>> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com