Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

Joe Ryner Wed, 01 May 2019 13:02:16 -0700

I think I have figured out the issue.

 POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET
RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE
 images    28523G                3.0        68779G  1.2441
     1000              warn


My images are 28523G with a replication level 3 and have a total of
68779G in Raw Capacity.

 According to the documentation
http://docs.ceph.com/docs/master/rados/operations/placement-groups/

"*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if
present, is the amount of data the administrator has specified that
they expect to eventually be stored in this pool. The system uses the
larger of the two values for its calculation.

*RATE* is the multiplier for the pool that determines how much raw storage
capacity is consumed. For example, a 3 replica pool will have a ratio of
3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5.

*RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that
are responsible for storing this pool’s (and perhaps other pools’) data.
*RATIO* is the ratio of that total capacity that this pool is consuming
(i.e., ratio = size * rate / raw capacity)."

So ratio = "28523G * 3.0/68779G" = 1.2441x


So I'm oversubscribing by 1.2441x, thus the warning.


But ... looking at #ceph df

POOL         ID     STORED      OBJECTS     USED        %USED     MAX AVAIL

images        3     9.3 TiB       2.82M      28 TiB     57.94       6.7 TiB


I believe the 9.3TiB is the amount I have that is thinly provisioned
vs a fully provisioned 28 TiB?

The raw capacity of the cluster is sitting at about 50% used.


Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from
ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin
provisioning in rbd?

Otherwise, this ratio will only work for people who don't thin
provision which goes against what ceph is doing with rbd

http://docs.ceph.com/docs/master/rbd/





On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jry...@cait.org> wrote:

> I have found a little more information.
> When I turn off pg_autoscaler the warning goes away turn it back on and
> the warning comes back.
>
> I have ran the following:
> # ceph osd pool autoscale-status
>  POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO
> PG_NUM  NEW PG_NUM  AUTOSCALE
>  images    28523G                3.0        68779G  1.2441
>   1000              warn
>  locks     676.5M                3.0        68779G  0.0000
>      8              warn
>  rbd           0                 3.0        68779G  0.0000
>      8              warn
>  data          0                 3.0        68779G  0.0000
>      8              warn
>  metadata   3024k                3.0        68779G  0.0000
>      8              warn
>
> # ceph df
> RAW STORAGE:
>     CLASS     SIZE       AVAIL       USED        RAW USED     %RAW USED
>     hdd       51 TiB      26 TiB      24 TiB       24 TiB         48.15
>     ssd       17 TiB     8.5 TiB     8.1 TiB      8.1 TiB         48.69
>     TOTAL     67 TiB      35 TiB      32 TiB       32 TiB         48.28
>
> POOLS:
>     POOL         ID     STORED      OBJECTS     USED        %USED     MAX
> AVAIL
>     data          0         0 B           0         0 B         0
>  6.7 TiB
>     metadata      1     6.3 KiB          21     3.0 MiB         0
>  6.7 TiB
>     rbd           2         0 B           2         0 B         0
>  6.7 TiB
>     images        3     9.3 TiB       2.82M      28 TiB     57.94
>  6.7 TiB
>     locks         4     215 MiB         517     677 MiB         0
>  6.7 TiB
>
>
> It looks to me like it thinks the images pool no right in the
> autoscale-status.
>
> Below is a osd crush tree
> # ceph osd crush tree
> ID  CLASS WEIGHT   (compat) TYPE NAME
>  -1       66.73337          root default
>  -3       22.28214 22.28214     rack marack
>  -8        7.27475  7.27475         host abacus
>  19   hdd  1.81879  1.81879             osd.19
>  20   hdd  1.81879  1.42563             osd.20
>  21   hdd  1.81879  1.81879             osd.21
>  50   hdd  1.81839  1.81839             osd.50
> -10        7.76500  6.67049         host gold
>   7   hdd  0.86299  0.83659             osd.7
>   9   hdd  0.86299  0.78972             osd.9
>  10   hdd  0.86299  0.72031             osd.10
>  14   hdd  0.86299  0.65315             osd.14
>  15   hdd  0.86299  0.72586             osd.15
>  22   hdd  0.86299  0.80528             osd.22
>  23   hdd  0.86299  0.63741             osd.23
>  24   hdd  0.86299  0.77718             osd.24
>  25   hdd  0.86299  0.72499             osd.25
>  -5        7.24239  7.24239         host hassium
>   0   hdd  1.80800  1.52536             osd.0
>   1   hdd  1.80800  1.65421             osd.1
>  26   hdd  1.80800  1.65140             osd.26
>  51   hdd  1.81839  1.81839             osd.51
>  -2       21.30070 21.30070     rack marack2
> -12        7.76999  8.14474         host hamms
>  27   ssd  0.86299  0.99367             osd.27
>  28   ssd  0.86299  0.95961             osd.28
>  29   ssd  0.86299  0.80768             osd.29
>  30   ssd  0.86299  0.86893             osd.30
>  31   ssd  0.86299  0.92583             osd.31
>  32   ssd  0.86299  1.00227             osd.32
>  33   ssd  0.86299  0.73099             osd.33
>  34   ssd  0.86299  0.80766             osd.34
>  35   ssd  0.86299  1.04811             osd.35
>  -7        5.45636  5.45636         host parabola
>   5   hdd  1.81879  1.81879             osd.5
>  12   hdd  1.81879  1.81879             osd.12
>  13   hdd  1.81879  1.81879             osd.13
>  -6        2.63997  3.08183         host radium
>   2   hdd  0.87999  1.05594             osd.2
>   6   hdd  0.87999  1.10501             osd.6
>  11   hdd  0.87999  0.92088             osd.11
>  -9        5.43439  5.43439         host splinter
>  16   hdd  1.80800  1.80800             osd.16
>  17   hdd  1.81839  1.81839             osd.17
>  18   hdd  1.80800  1.80800             osd.18
> -11       23.15053 23.15053     rack marack3
> -13        8.63300  8.98921         host helm
>  36   ssd  0.86299  0.71931             osd.36
>  37   ssd  0.86299  0.92601             osd.37
>  38   ssd  0.86299  0.79585             osd.38
>  39   ssd  0.86299  1.08521             osd.39
>  40   ssd  0.86299  0.89500             osd.40
>  41   ssd  0.86299  0.92351             osd.41
>  42   ssd  0.86299  0.89690             osd.42
>  43   ssd  0.86299  0.92480             osd.43
>  44   ssd  0.86299  0.84467             osd.44
>  45   ssd  0.86299  0.97795             osd.45
> -40        7.27515  7.89609         host samarium
>  46   hdd  1.81879  1.90242             osd.46
>  47   hdd  1.81879  1.86723             osd.47
>  48   hdd  1.81879  1.93404             osd.48
>  49   hdd  1.81879  2.19240             osd.49
>  -4        7.24239  7.24239         host scandium
>   3   hdd  1.80800  1.76680             osd.3
>   4   hdd  1.80800  1.80800             osd.4
>   8   hdd  1.80800  1.80800             osd.8
>  52   hdd  1.81839  1.81839             osd.52
>
>
> Any ideas?
>
>
>
>
>
> On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jry...@cait.org> wrote:
>
>> Hi,
>>
>> I have an old ceph cluster and have upgraded recently from Luminous to
>> Nautilus.  After converting to Nautilus I decided it was time to convert to
>> bluestore.
>>
>> Before I converted the cluster was healthy but after I have a HEALTH_WARN
>>
>> #ceph health detail
>> HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1
>> subtrees have overcommitted pool target_size_ratio
>> POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool
>> target_size_bytes
>>     Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit
>> available storage by 1.244x due to target_size_bytes    0  on pools []
>> POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool
>> target_size_ratio
>>     Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit
>> available storage by 1.244x due to target_size_ratio 0.000 on pools []
>>
>> I started with a target_size ratio of .85 on the images pool and reduced
>> it to 0 to hopefully get the warning to go away.  The cluster seems to be
>> running fine, I just can't figure out what the problem is and how to make
>> the message go away.  I restarted the monitors this morning in hopes to fix
>> it.  Anyone have any ideas?
>>
>> Thanks in advance
>>
>>
>> --
>> Joe Ryner
>> Associate Director
>> Center for the Application of Information Technologies (CAIT) -
>> http://www.cait.org
>> Western Illinois University - http://www.wiu.edu
>>
>>
>> P: (309) 298-1804
>> F: (309) 298-2806
>>
>
>
> --
> Joe Ryner
> Associate Director
> Center for the Application of Information Technologies (CAIT) -
> http://www.cait.org
> Western Illinois University - http://www.wiu.edu
>
>
> P: (309) 298-1804
> F: (309) 298-2806
>


-- 
Joe Ryner
Associate Director
Center for the Application of Information Technologies (CAIT) -
http://www.cait.org
Western Illinois University - http://www.wiu.edu


P: (309) 298-1804
F: (309) 298-2806

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

Reply via email to