[ceph-users] Re: dealing with spillovers

Reed Dier Fri, 05 Jun 2020 22:02:19 -0700

The WAL/DB was part of the OSD deployment.

OSD is running 14.2.9.


Would grabbing the ceph-kvstore-tool bluestore-kv <path-to-osd> stats as in 
that ticket be of any usefulness to this?

Thanks,

Reed

> On Jun 5, 2020, at 5:27 PM, Igor Fedotov <ifedo...@suse.de> wrote:
> 
> This might help -see comment #4 at https://tracker.ceph.com/issues/44509 
> <https://tracker.ceph.com/issues/44509>
> 
> And just for the sake of information collection - what Ceph version is used 
> in this cluster?
> 
> Did you setup DB volume along with OSD deployment or they were added later as 
>  was done in the ticket above?
> 
> 
> 
> Thanks,
> 
> Igor
> 
> On 6/6/2020 1:07 AM, Reed Dier wrote:
>> I'm going to piggy back on this somewhat.
>> 
>> I've battled RocksDB spillovers over the course of the life of the cluster 
>> since moving to bluestore, however I have always been able to compact it 
>> well enough.
>> 
>> But now I am stumped at getting this to compact via $ceph tell osd.$osd 
>> compact, which has always worked in the past.
>> 
>> No matter how many times I compact it, I always spill over exactly 192KiB.
>>> BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s)
>>>      osd.36 spilled over 192 KiB metadata from 'db' device (26 GiB used of 
>>> 34 GiB) to slow device
>>>      osd.36 spilled over 192 KiB metadata from 'db' device (16 GiB used of 
>>> 34 GiB) to slow device
>>>      osd.36 spilled over 192 KiB metadata from 'db' device (22 GiB used of 
>>> 34 GiB) to slow device
>>>      osd.36 spilled over 192 KiB metadata from 'db' device (13 GiB used of 
>>> 34 GiB) to slow device
>> 
>> The multiple entries are from different time trying to compact it.
>> 
>> The OSD is a 1.92TB SATA SSD, the WAL/DB is a 36GB partition on NVMe.
>> I tailed and tee'd the OSD's logs during a manual compaction here: 
>> https://pastebin.com/bcpcRGEe <https://pastebin.com/bcpcRGEe>
>> This is with the normal logging level.
>> I have no idea how to make heads or tails of that log data, but maybe 
>> someone can figure out why this one OSD just refuses to compact?
>> 
>> OSD is 14.2.9.
>> OS is U18.04.
>> Kernel is 4.15.0-96.
>> 
>> I haven't played with ceph-bluestore-tool or ceph-kvstore-tool but after 
>> seeing the above mention in this thread, I do see ceph-kvstore-tool 
>> <rocksdb|bluestore-kv?> compact, which sounds like it may be the same thing 
>> that ceph tell compact does under the hood?
>>> compact
>>> Subcommand compact is used to compact all data of kvstore. It will open the 
>>> database, and trigger a database's compaction. After compaction, some disk 
>>> space may be released.
>> 
>> 
>> Also, not sure if this is helpful:
>>> osd.36 spilled over 192 KiB metadata from 'db' device (13 GiB used of 34 
>>> GiB) to slow device
>>> ID   CLASS WEIGHT    REWEIGHT SIZE    RAW USE  DATA    OMAP    META    
>>> AVAIL   %USE  VAR  PGS STATUS TYPE NAME
>>>   36   ssd   1.77879  1.00000 1.8 TiB  1.2 TiB 1.2 TiB 6.2 GiB 7.2 GiB 603 
>>> GiB 66.88 0.94  85     up             osd.36
>> You can see the breakdown between OMAP data and META data.
>> 
>> After compacting again:
>>> osd.36 spilled over 192 KiB metadata from 'db' device (26 GiB used of 34 
>>> GiB) to slow device
>>> ID   CLASS WEIGHT    REWEIGHT SIZE    RAW USE  DATA    OMAP    META    
>>> AVAIL   %USE  VAR  PGS STATUS TYPE NAME
>>>   36   ssd   1.77879  1.00000 1.8 TiB  1.2 TiB 1.2 TiB 6.2 GiB  20 GiB 603 
>>> GiB 66.88 0.94  85     up             osd.36
>> 
>> So the OMAP size remained the same, while the metadata ballooned (while 
>> still conspicuously spilling over 192KiB exactly)
>> These OSDs have a few RBD images, cephfs metadata, and librados objects (not 
>> RGW) stored.
>> 
>> The breakdown of OMAP size is pretty widely binned, but the GiB sizes are 
>> definitely the minority.
>> Looking at the breakdown with some simple bash-fu
>> KiB = 147
>> MiB = 105
>> GiB = 24
>> 
>> To further divide that, all of the GiB sized OMAPs are SSD OSD's:
>> 
>> SSD
>> HDD
>> TOTAL
>> KiB
>> 0
>> 147
>> 147
>> MiB
>> 36
>> 69
>> 105
>> GiB
>> 24
>> 0
>> 24
>> 
>> I have no idea if any of these data points are pertinent or helpful, but I 
>> want to give as clear a picture as possible to prevent chasing the wrong 
>> thread.
>> Appreciate any help with this.
>> 
>> Thanks,
>> Reed
>> 
>>> On May 26, 2020, at 9:48 AM, thoralf schulze <t.schu...@tu-berlin.de 
>>> <mailto:t.schu...@tu-berlin.de>> wrote:
>>> 
>>> hi there,
>>> 
>>> trying to get around my head rocksdb spillovers and how to deal with
>>> them … in particular, i have one osds which does not have any pools
>>> associated (as per ceph pg ls-by-osd $osd ), yet it does show up in ceph
>>> health detail as:
>>> 
>>>     osd.$osd spilled over 2.9 MiB metadata from 'db' device (49 MiB
>>> used of 37 GiB) to slow device
>>> 
>>> compaction doesn't help. i am well aware of
>>> https://tracker.ceph.com/issues/38745 
>>> <https://tracker.ceph.com/issues/38745> , yet find it really
>>> counter-intuitive that an empty osd with a more-or-less optimal sized db
>>> volume can't fit its rockdb on the former.
>>> 
>>> is there any way to repair this, apart from re-creating the osd? fwiw,
>>> dumping the database with
>>> 
>>> ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$osd dump >
>>> bluestore_kv.dump
>>> 
>>> yields a file of less than 100mb in size.
>>> 
>>> and, while we're at it, a few more related questions:
>>> 
>>> - am i right to assume that the leveldb and rocksdb arguments to
>>> ceph-kvstore-tool are only relevant for osds with filestore-backend?
>>> - does ceph-kvstore-tool bluestore-kv … also deal with rocksdb-items for
>>> osds with bluestore-backend?
>>> 
>>> thank you very much & with kind regards,
>>> thoralf.
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io>
>>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>>> <mailto:ceph-users-le...@ceph.io>
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io>
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> <mailto:ceph-users-le...@ceph.io>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: dealing with spillovers

Reply via email to