Just to follow up on this:
I enabled up enabling the balancer module in upmap mode.
This did resolve the the short term issue and even things out a
bit...but things are still far from uniform.
It seems like the balancer option is an ongoing process that continues
to run over time...so maybe things will improve even more over the next
few weeks.
Thank you to everyone who helped provide insight into possible solutions.
Shain
On 4/30/19 2:08 PM, Dan van der Ster wrote:
Removing pools won't make a difference.
Read up to slide 22 here:
https://www.slideshare.net/mobile/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_mobile_Inktank-5FCeph_ceph-2Dday-2Dberlin-2Dmastering-2Dceph-2Doperations-2Dupmap-2Dand-2Dthe-2Dmgr-2Dbalancer&d=DwMFaQ&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=4MlyqfSU36xG5-kC0nVslDEynoWLQUUf5xCScKrouvU&s=u3DTgcPLZDP6XwXpcmvDKsNEPswrgE9ICvhnQzrZ1WY&e=>
..
Dan
(Apologies for terseness, I'm mobile)
On Tue, 30 Apr 2019, 20:02 Shain Miley, <smi...@npr.org
<mailto:smi...@npr.org>> wrote:
Here is the per pool pg_num info:
'data' pg_num 64
'metadata' pg_num 64
'rbd' pg_num 64
'npr_archive' pg_num 6775
'.rgw.root' pg_num 64
'.rgw.control' pg_num 64
'.rgw' pg_num 64
'.rgw.gc' pg_num 64
'.users.uid' pg_num 64
'.users.email' pg_num 64
'.users' pg_num 64
'.usage' pg_num 64
'.rgw.buckets.index' pg_num 128
'.intent-log' pg_num 8
'.rgw.buckets' pg_num 64
'kube' pg_num 512
'.log' pg_num 8
Here is the df output:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
1.06PiB 306TiB 778TiB 71.75
POOLS:
NAME ID USED %USED MAX AVAIL
OBJECTS
data 0 11.7GiB 0.14
8.17TiB 3006
metadata 1 0B 0
8.17TiB 0
rbd 2 43.2GiB 0.51
8.17TiB 11147
npr_archive 3 258TiB 97.93 5.45TiB
82619649
.rgw.root 4 1001B 0
8.17TiB 5
.rgw.control 5 0B 0
8.17TiB 8
.rgw 6 6.16KiB 0
8.17TiB 35
.rgw.gc 7 0B 0
8.17TiB 32
.users.uid 8 0B 0
8.17TiB 0
.users.email 9 0B 0
8.17TiB 0
.users 10 0B 0
8.17TiB 0
.usage 11 0B 0
8.17TiB 1
.rgw.buckets.index 12 0B 0
8.17TiB 26
.intent-log 17 0B 0
5.45TiB 0
.rgw.buckets 18 24.2GiB 0.29
8.17TiB 6622
kube 21 1.82GiB 0.03
5.45TiB 550
.log 22 0B 0
5.45TiB 176
The stuff in the data pool and the rwg pools is old data that we used
for testing...if you guys think that removing everything outside
of rbd
and npr_archive would make a significant impact I will give it a try.
Thanks,
Shain
On 4/30/19 1:15 PM, Jack wrote:
> Hi,
>
> I see that you are using rgw
> RGW comes with many pools, yet most of them are used for
metadata and
> configuration, those do not store many data
> Such pools do not need more than a couple PG, each (I use pg_num
= 8)
>
> You need to allocate your pg on pool that actually stores the data
>
> Please do the following, to let us know more:
> Print the pg_num per pool:
> for i in $(rados lspools); do echo -n "$i: "; ceph osd pool get $i
> pg_num; done
>
> Print the usage per pool:
> ceph df
>
> Also, instead of doing a "ceph osd reweight-by-utilization",
check out
> the balancer plugin :
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_mimic_mgr_balancer_&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=YoiU-wa-ZXHUEj8xYmiSVRVnXnDenoUaRZMa-bfRFvo&e=
>
> Finally, in nautilus, the pg can now upscale and downscale
automaticaly
> See
https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.com_rados_new-2Din-2Dnautilus-2Dpg-2Dmerging-2Dand-2Dautotuning_&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=7-W9i3gJAcCtrL7MzjJlG5LZ_91zeesYBT7g0rGrLh0&e=
>
>
> On 04/30/2019 06:34 PM, Shain Miley wrote:
>> Hi,
>>
>> We have a cluster with 235 osd's running version 12.2.11 with a
>> combination of 4 and 6 TB drives. The data distribution across
osd's
>> varies from 52% to 94%.
>>
>> I have been trying to figure out how to get this a bit more
balanced as
>> we are running into 'backfillfull' issues on a regular basis.
>>
>> I've tried adding more pgs...but this did not seem to do much
in terms
>> of the imbalance.
>>
>> Here is the end output from 'ceph osd df':
>>
>> MIN/MAX VAR: 0.73/1.31 STDDEV: 7.73
>>
>> We have 8199 pgs total with 6775 of them in the pool that has
97% of the
>> data.
>>
>> The other pools are not really used (data, metadata, .rgw.root,
>> .rgw.control, etc). I have thought about deleting those unused
pools so
>> that most if not all the pgs are being used by the pool with the
>> majority of the data.
>>
>> However...before I do that...there anything else I can do or try in
>> order to see if I can balance out the data more uniformly?
>>
>> Thanks in advance,
>>
>> Shain
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwICAg&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=1BfaF7xeFT_o8pdT9mrRmWm0gCn4wgalDi3UviTy24M&s=BczlpHmYiubLlNUhgDHcEsVHAsR_RYCKYV2G_5w2Vio&e=
--
NPR | Shain Miley | Manager of Infrastructure, Digital Media |
smi...@npr.org <mailto:smi...@npr.org> | 202.513.3649
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwMFaQ&c=E2nBno7hEddFhl23N5nD1Q&r=cqFccwnwHGRorPuRWs36Dw&m=4MlyqfSU36xG5-kC0nVslDEynoWLQUUf5xCScKrouvU&s=wWbsuWRLMa8AiqSBQMTx97JjKrNgoad4QzleW6czCqE&e=>
--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org |
202.513.3649
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com