On Tue, May 23, 2017 at 6:28 AM <george.vasilaka...@stfc.ac.uk> wrote:
> Hi Wido, > > I see your point. I would expect OMAPs to grow with the number of objects > but multiple OSDs getting to multiple tens of GBs for their omaps seems > excessive. I find it difficult to believe that not sharding the index for a > bucket of 500k objects in RGW causes the 10 largest OSD omaps to grow to a > total 512GB which is about 2000 greater that than size of 10 average omaps. > Given the relative usage of our pools and the much greater prominence of > our non-RGW pools on the OSDs with huge omaps I'm not inclined to think > this is caused by some RGW configuration (or lack thereof). > > It's also worth pointing out that we've seen problems with files being > slow to retrieve (I'm talking about rados get doing 120MB/sec on one file > and 2MB/sec on another) and subsequently the omap of the OSD hosting the > first stripe of those growing from 30MB to 5GB in the span of an hour > during which the logs are flooded with LevelDB compaction activity. > This does sound weird, but I also notice that in your earlier email you seemed to have only ~5k PGs across ~1400 OSDs, which is a pretty low number. You may just have a truly horrible PG balance; can you share more details (eg ceph osd df)? -Greg > > Best regards, > > George > ________________________________________ > From: Wido den Hollander [w...@42on.com] > Sent: 23 May 2017 14:00 > To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Large OSD omap directories (LevelDBs) > > > Op 23 mei 2017 om 13:01 schreef george.vasilaka...@stfc.ac.uk: > > > > > > > Your RGW buckets, how many objects in them, and do they have the index > > > sharded? > > > > > I know we have some very large & old buckets (10M+ RGW objects in a > > > single bucket), with correspondingly large OMAPs wherever that bucket > > > index is living (sufficently large that trying to list the entire thing > > > online is fruitless). ceph's pgmap status says we have 2G RADOS objects > > > however, and you're only at 61M RADOS objects. > > > > > > According to radosgw-admin bucket stats the most populous bucket > contains 568101 objects. There is no index sharding. The > default.rgw.buckets.data pool contains 4162566 objects, I think striping is > done by default for 4MB sizes stripes. > > > > Without index sharding 500k objects in a bucket can already cause larger > OMAP directories. I'd recommend that you at least start to shard them. > > Wido > > > Bear in mind RGW is a small use case for us currently. > > Most of the data lives in a pool that is accessed by specialized servers > that have plugins based on libradosstriper. That pool stores around 1.8 PB > in 32920055 objects. > > > > One thing of note is that we have this: > > filestore_xattr_use_omap=1 > > in our ceph.conf and libradosstriper makes use of xattrs for striping > metadata and locking mechanisms. > > > > This seems to have been removed some time ago but the question is could > have any effect? This cluster was built in January and ran Jewel initially. > > > > I do see the xattrs in XFS but a sampling of an omap dir from an OSD > showed like there might be some xattrs in there too. > > > > I'm going to try restarting an OSD with a big omap and also extracting a > copy of one for further inspection. > > It seems to me like they might not be cleaning up old data. I'm fairly > certain an active cluster would've compacted enough for 3 month old SSTs to > go away. > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com