I think I have figured out the issue. POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALE images 28523G 3.0 68779G 1.2441 1000 warn
My images are 28523G with a replication level 3 and have a total of 68779G in Raw Capacity. According to the documentation http://docs.ceph.com/docs/master/rados/operations/placement-groups/ "*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if present, is the amount of data the administrator has specified that they expect to eventually be stored in this pool. The system uses the larger of the two values for its calculation. *RATE* is the multiplier for the pool that determines how much raw storage capacity is consumed. For example, a 3 replica pool will have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5. *RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that are responsible for storing this pool’s (and perhaps other pools’) data. *RATIO* is the ratio of that total capacity that this pool is consuming (i.e., ratio = size * rate / raw capacity)." So ratio = "28523G * 3.0/68779G" = 1.2441x So I'm oversubscribing by 1.2441x, thus the warning. But ... looking at #ceph df POOL ID STORED OBJECTS USED %USED MAX AVAIL images 3 9.3 TiB 2.82M 28 TiB 57.94 6.7 TiB I believe the 9.3TiB is the amount I have that is thinly provisioned vs a fully provisioned 28 TiB? The raw capacity of the cluster is sitting at about 50% used. Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin provisioning in rbd? Otherwise, this ratio will only work for people who don't thin provision which goes against what ceph is doing with rbd http://docs.ceph.com/docs/master/rbd/ On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jry...@cait.org> wrote: > I have found a little more information. > When I turn off pg_autoscaler the warning goes away turn it back on and > the warning comes back. > > I have ran the following: > # ceph osd pool autoscale-status > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO > PG_NUM NEW PG_NUM AUTOSCALE > images 28523G 3.0 68779G 1.2441 > 1000 warn > locks 676.5M 3.0 68779G 0.0000 > 8 warn > rbd 0 3.0 68779G 0.0000 > 8 warn > data 0 3.0 68779G 0.0000 > 8 warn > metadata 3024k 3.0 68779G 0.0000 > 8 warn > > # ceph df > RAW STORAGE: > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 51 TiB 26 TiB 24 TiB 24 TiB 48.15 > ssd 17 TiB 8.5 TiB 8.1 TiB 8.1 TiB 48.69 > TOTAL 67 TiB 35 TiB 32 TiB 32 TiB 48.28 > > POOLS: > POOL ID STORED OBJECTS USED %USED MAX > AVAIL > data 0 0 B 0 0 B 0 > 6.7 TiB > metadata 1 6.3 KiB 21 3.0 MiB 0 > 6.7 TiB > rbd 2 0 B 2 0 B 0 > 6.7 TiB > images 3 9.3 TiB 2.82M 28 TiB 57.94 > 6.7 TiB > locks 4 215 MiB 517 677 MiB 0 > 6.7 TiB > > > It looks to me like it thinks the images pool no right in the > autoscale-status. > > Below is a osd crush tree > # ceph osd crush tree > ID CLASS WEIGHT (compat) TYPE NAME > -1 66.73337 root default > -3 22.28214 22.28214 rack marack > -8 7.27475 7.27475 host abacus > 19 hdd 1.81879 1.81879 osd.19 > 20 hdd 1.81879 1.42563 osd.20 > 21 hdd 1.81879 1.81879 osd.21 > 50 hdd 1.81839 1.81839 osd.50 > -10 7.76500 6.67049 host gold > 7 hdd 0.86299 0.83659 osd.7 > 9 hdd 0.86299 0.78972 osd.9 > 10 hdd 0.86299 0.72031 osd.10 > 14 hdd 0.86299 0.65315 osd.14 > 15 hdd 0.86299 0.72586 osd.15 > 22 hdd 0.86299 0.80528 osd.22 > 23 hdd 0.86299 0.63741 osd.23 > 24 hdd 0.86299 0.77718 osd.24 > 25 hdd 0.86299 0.72499 osd.25 > -5 7.24239 7.24239 host hassium > 0 hdd 1.80800 1.52536 osd.0 > 1 hdd 1.80800 1.65421 osd.1 > 26 hdd 1.80800 1.65140 osd.26 > 51 hdd 1.81839 1.81839 osd.51 > -2 21.30070 21.30070 rack marack2 > -12 7.76999 8.14474 host hamms > 27 ssd 0.86299 0.99367 osd.27 > 28 ssd 0.86299 0.95961 osd.28 > 29 ssd 0.86299 0.80768 osd.29 > 30 ssd 0.86299 0.86893 osd.30 > 31 ssd 0.86299 0.92583 osd.31 > 32 ssd 0.86299 1.00227 osd.32 > 33 ssd 0.86299 0.73099 osd.33 > 34 ssd 0.86299 0.80766 osd.34 > 35 ssd 0.86299 1.04811 osd.35 > -7 5.45636 5.45636 host parabola > 5 hdd 1.81879 1.81879 osd.5 > 12 hdd 1.81879 1.81879 osd.12 > 13 hdd 1.81879 1.81879 osd.13 > -6 2.63997 3.08183 host radium > 2 hdd 0.87999 1.05594 osd.2 > 6 hdd 0.87999 1.10501 osd.6 > 11 hdd 0.87999 0.92088 osd.11 > -9 5.43439 5.43439 host splinter > 16 hdd 1.80800 1.80800 osd.16 > 17 hdd 1.81839 1.81839 osd.17 > 18 hdd 1.80800 1.80800 osd.18 > -11 23.15053 23.15053 rack marack3 > -13 8.63300 8.98921 host helm > 36 ssd 0.86299 0.71931 osd.36 > 37 ssd 0.86299 0.92601 osd.37 > 38 ssd 0.86299 0.79585 osd.38 > 39 ssd 0.86299 1.08521 osd.39 > 40 ssd 0.86299 0.89500 osd.40 > 41 ssd 0.86299 0.92351 osd.41 > 42 ssd 0.86299 0.89690 osd.42 > 43 ssd 0.86299 0.92480 osd.43 > 44 ssd 0.86299 0.84467 osd.44 > 45 ssd 0.86299 0.97795 osd.45 > -40 7.27515 7.89609 host samarium > 46 hdd 1.81879 1.90242 osd.46 > 47 hdd 1.81879 1.86723 osd.47 > 48 hdd 1.81879 1.93404 osd.48 > 49 hdd 1.81879 2.19240 osd.49 > -4 7.24239 7.24239 host scandium > 3 hdd 1.80800 1.76680 osd.3 > 4 hdd 1.80800 1.80800 osd.4 > 8 hdd 1.80800 1.80800 osd.8 > 52 hdd 1.81839 1.81839 osd.52 > > > Any ideas? > > > > > > On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jry...@cait.org> wrote: > >> Hi, >> >> I have an old ceph cluster and have upgraded recently from Luminous to >> Nautilus. After converting to Nautilus I decided it was time to convert to >> bluestore. >> >> Before I converted the cluster was healthy but after I have a HEALTH_WARN >> >> #ceph health detail >> HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 >> subtrees have overcommitted pool target_size_ratio >> POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool >> target_size_bytes >> Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit >> available storage by 1.244x due to target_size_bytes 0 on pools [] >> POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool >> target_size_ratio >> Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit >> available storage by 1.244x due to target_size_ratio 0.000 on pools [] >> >> I started with a target_size ratio of .85 on the images pool and reduced >> it to 0 to hopefully get the warning to go away. The cluster seems to be >> running fine, I just can't figure out what the problem is and how to make >> the message go away. I restarted the monitors this morning in hopes to fix >> it. Anyone have any ideas? >> >> Thanks in advance >> >> >> -- >> Joe Ryner >> Associate Director >> Center for the Application of Information Technologies (CAIT) - >> http://www.cait.org >> Western Illinois University - http://www.wiu.edu >> >> >> P: (309) 298-1804 >> F: (309) 298-2806 >> > > > -- > Joe Ryner > Associate Director > Center for the Application of Information Technologies (CAIT) - > http://www.cait.org > Western Illinois University - http://www.wiu.edu > > > P: (309) 298-1804 > F: (309) 298-2806 > -- Joe Ryner Associate Director Center for the Application of Information Technologies (CAIT) - http://www.cait.org Western Illinois University - http://www.wiu.edu P: (309) 298-1804 F: (309) 298-2806
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com