Re: [ceph-users] Dealing with radosgw and large OSD LevelDBs: compact, start over, something else?

Haomai Wang Mon, 21 Dec 2015 07:16:09 -0800

On Mon, Dec 21, 2015 at 10:55 PM, Florian Haas <flor...@hastexo.com> wrote:


> On Mon, Dec 21, 2015 at 3:35 PM, Haomai Wang <hao...@xsky.com> wrote:
> >
> >
> > On Fri, Dec 18, 2015 at 1:16 AM, Florian Haas <flor...@hastexo.com>
> wrote:
> >>
> >> Hey everyone,
> >>
> >> I recently got my hands on a cluster that has been underperforming in
> >> terms of radosgw throughput, averaging about 60 PUTs/s with 70K
> >> objects where a freshly-installed cluster with near-identical
> >> configuration would do about 250 PUTs/s. (Neither of these values are
> >> what I'd consider high throughput, but this is just to give you a feel
> >> about the relative performance hit.)
> >>
> >> Some digging turned up that of the less than 200 buckets in the
> >> cluster, about 40 held in excess of a million objects (1-4M), which
> >> one bucket being an outlier with 45M objects. All buckets were created
> >> post-Hammer, and use 64 index shards. The total number of objects in
> >> radosgw is approx. 160M.
> >>
> >> Now this isn't a large cluster in terms of OSD distribution; there are
> >> only 12 OSDs (after all, we're only talking double-digit terabytes
> >> here). In almost all of these OSDs, the LevelDB omap directory has
> >> grown to a size of 10-20 GB.
> >>
> >> So I have several questions on this:
> >>
> >> - Is it correct to assume that such a large LevelDB would be quite
> >> detrimental to radosgw performance overall?
> >>
> >> - If so, would clearing that one large bucket and distributing the
> >> data over several new buckets reduce the LevelDB size at all?
> >>
> >> - Is there even something akin to "ceph mon compact" for OSDs?
> >>
> >> - Are these large LevelDB databases a simple consequence of having a
> >> combination of many radosgw objects and few OSDs, with the
> >> distribution per-bucket being comparatively irrelevant?
> >>
> >> I do understand that the 45M object bucket itself would have been a
> >> problem pre-Hammer, with no index sharding available. But with what
> >> others have shared here, a rule of thumb of one index shard per
> >> million objects should be a good one to follow, so 64 shards for 45M
> >> objects doesn't strike me as totally off the mark. That's why I think
> >> LevelDB I/O is actually the issue here. But I might be totally wrong;
> >> all insights appreciated. :)
> >
> >
> > Do you enable bucket index sharding?
>
> As stated above, yes. 64 shards.
>
> > I'm not sure your bottleneck regard to your cluster, I guess you could
> > disable leveldb compression to test whether reduce compaction influence.
>
> Hmmm, you mean with "leveldb_compression = false"? Could you explain
> why exactly *disabling* compression would help with large omaps?
>
> Also, would "osd_compact_leveldb_on_mount" (undocumented) help here?
> It looks to me like that is an option with no actual implementing
> code, but I may be missing something.
>
> The similarly named leveldb_compact_on_mount seems to only compact
> LevelDB data in LevelDBStore. But I may be mistaken there too, as that
> option also seems to be undocumented. Would configuring an osd with
> leveldb_compact_on_mount=true do omap compaction on OSD daemon
> startup, in a FileStore OSD?
>

I don't have exact info to sure this is the problem for your case, before I
met this problem and because leveldb own single compaction thread which
consume lots of time on compress/uncompress to do compaction.

what's your version, I guess "leveldb_compression" or
"osd_leveldb_compression" can help


>
> Cheers,
> Florian
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Best Regards,

Wheat

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dealing with radosgw and large OSD LevelDBs: compact, start over, something else?

Reply via email to