On Tue, May 23, 2017 at 6:28 AM <george.vasilaka...@stfc.ac.uk> wrote:

> Hi Wido,
>
> I see your point. I would expect OMAPs to grow with the number of objects
> but multiple OSDs getting to multiple tens of GBs for their omaps seems
> excessive. I find it difficult to believe that not sharding the index for a
> bucket of 500k objects in RGW causes the 10 largest OSD omaps to grow to a
> total 512GB which is about 2000 greater that than size of 10 average omaps.
> Given the relative usage of our pools and the much greater prominence of
> our non-RGW pools on the OSDs with huge omaps I'm not inclined to think
> this is caused by some RGW configuration (or lack thereof).
>
> It's also worth pointing out that we've seen problems with files being
> slow to retrieve (I'm talking about rados get doing 120MB/sec on one file
> and 2MB/sec on another) and subsequently the omap of the OSD hosting the
> first stripe of those growing from 30MB to 5GB in the span of an hour
> during which the logs are flooded with LevelDB compaction activity.
>

This does sound weird, but I also notice that in your earlier email you
seemed to have only ~5k PGs across  ~1400 OSDs, which is a pretty low
number. You may just have a truly horrible PG balance; can you share more
details (eg ceph osd df)?
-Greg


>
> Best regards,
>
> George
> ________________________________________
> From: Wido den Hollander [w...@42on.com]
> Sent: 23 May 2017 14:00
> To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Large OSD omap directories (LevelDBs)
>
> > Op 23 mei 2017 om 13:01 schreef george.vasilaka...@stfc.ac.uk:
> >
> >
> > > Your RGW buckets, how many objects in them, and do they have the index
> > > sharded?
> >
> > > I know we have some very large & old buckets (10M+ RGW objects in a
> > > single bucket), with correspondingly large OMAPs wherever that bucket
> > > index is living (sufficently large that trying to list the entire thing
> > > online is fruitless). ceph's pgmap status says we have 2G RADOS objects
> > > however, and you're only at 61M RADOS objects.
> >
> >
> > According to radosgw-admin bucket stats the most populous bucket
> contains 568101 objects. There is no index sharding. The
> default.rgw.buckets.data pool contains 4162566 objects, I think striping is
> done by default for 4MB sizes stripes.
> >
>
> Without index sharding 500k objects in a bucket can already cause larger
> OMAP directories. I'd recommend that you at least start to shard them.
>
> Wido
>
> > Bear in mind RGW is a small use case for us currently.
> > Most of the data lives in a pool that is accessed by specialized servers
> that have plugins based on libradosstriper. That pool stores around 1.8 PB
> in 32920055 objects.
> >
> > One thing of note is that we have this:
> > filestore_xattr_use_omap=1
> > in our ceph.conf and libradosstriper makes use of xattrs for striping
> metadata and locking mechanisms.
> >
> > This seems to have been removed some time ago but the question is could
> have any effect? This cluster was built in January and ran Jewel initially.
> >
> > I do see the xattrs in XFS but a sampling of an omap dir from an OSD
> showed like there might be some xattrs in there too.
> >
> > I'm going to try restarting an OSD with a big omap and also extracting a
> copy of one for further inspection.
> > It seems to me like they might not be cleaning up old data. I'm fairly
> certain an active cluster would've compacted enough for 3 month old SSTs to
> go away.
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to