Hello Igor

>> It looks like you're right.
>>
>> ceph tell osd.1750 status
>> {
>>     "cluster_fsid": "bec60cda-a306-11ed-abd9-75488d4e8f4a",
>>     "osd_fsid": "388906f8-1df8-45a2-9895-067ee2e0c055",
>>     "whoami": 1750,
>>     "state": "active",
>>     "maps": "[1502335~261370]",
>>     "oldest_map": "1502335",
>>     "newest_map": "1763704",
>>     "cluster_osdmap_trim_lower_bound": 1502335,
>>     "num_pgs": 0
>> }
>>
>> One of the OSDs I've checked has about 261K maps
>> Could this cause bluestore to grow to ~380GiB?
>
> You better check objects size/count in meta pool using ceph-objectstore-
> tool and estimate the total numbers:
>
> ceph-objectstore-tool --data-path <path-to-osd> --op meta-list >
> meta_list; cat meta_list | wc
>
> ceph-objectstore-tool --data-path <path-to-osd> --pgid meta <oid> dump |
> grep size
>
> The latter command can obtain onode size for a given object - just use a
> few oids corresponding to specific onode types (osdmap* and inc_osdmap*
> are of particular interest, other types to be checked if they are in
> bulky counts) from meta_list file.
>

There is a lot of objects on the OSD.
ceph-objectstore-tool --data-path ./osd.1750 --op meta-list | wc -l
528023
(Almost all are osdmap or inc_osdmap)
On smaller healthy cluster similar command gives me about 1400 objects


I've checked few osdmap.* objects
they are also 6-7 times bigger than osdmap.* objects on smaller healthy cluster.

examples:
ceph-objectstore-tool --data-path ./osd.1750 --pgid meta '{"oid":"osdmap.1612379","key":"","snapid":0,"hash":42339
99933,"max":0,"pool":-1,"namespace":"","max":0}' dump |grep size
Error getting attr on : meta,#-1:bc6dba3f:::osdmap.1612379:0#, (61) No data available
        "size": 1622339,
        "blksize": 4096,
        "size": 1622339,
        "expected_object_size": 0,
        "expected_write_size": 0,

ceph-objectstore-tool --data-path ./osd.1750 --pgid meta '{"oid":"osdmap.1524595","key":"","snapid":0,"hash":4208913981,"max":0,"pool":-1,"namespace":"","max":0}' dump |grep size Error getting attr on : meta,#-1:bc777b5f:::osdmap.1524595:0#, (61) No data available
        "size": 1922137,
        "blksize": 4096,
        "size": 1922137,
        "expected_object_size": 0,
        "expected_write_size": 0,


>> What settings could affect number of mapps stored on OSD.
>> Only think that comes to mind is mon_min_osdmap_epochs which I
>> configured to 2000 a while ago.
>>
> osdmap-s should be trimmed automatically in a healthy cluster. Perhaps
> an ongoing rebalancing prevents or some other issue from that. The first
> question would be how osdmap epochs evolve? Is oldest_map increasing? Is
> the delta decreasing?
>
oldest_map is not increasing.
Delta has slightly increased since yesterday by about 2763.
SSD OSD usage increased by about 1% yesterday.
We are quite close to finishing backfill.
I'm expecting to get active+clean cluster before OSD fill up.


  data:
    volumes: 1/1 healthy
    pools:   8 pools, 20641 pgs
    objects: 2.15G objects, 7.6 PiB
    usage:   11 PiB used, 16 PiB / 27 PiB avail
    pgs:     196301608/17155230197 objects misplaced (1.144%)
             17576 active+clean
             1723  active+clean+scrubbing
             779   active+clean+scrubbing+deep
             381   active+remapped+backfill_wait
             182   active+remapped+backfilling

  io:
    client:   232 B/s rd, 0 op/s rd, 0 op/s wr
    recovery: 2.4 GiB/s, 639 objects/s


I guess we will wait 3-4 days for cluster recovery to finish and I will update you if something happens once it's HEALTH_OK.

Is there a way to force epoch/osdmap trimming on rebalancing cluster?
Afaik there no way to do it. It would be nice if we had such ability. It's not the first time we have this class of issues (big mon DBs). I would consider it a major flaw of the ceph. It cannot get healthier until it's not in a perfectly healthy state.

Best regards
Adam Prycki

Attachment: smime.p7s
Description: Kryptograficzna sygnatura S/MIME

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to