Hello Andras,

Some initial observations and questions: 

The total PG recommendation for this cluster would actually be 8192 PGs per the 
formula. 

Total PG's = (90 * 100) / 2 = 4500 

Next power of 2 = 8192. 

The result should be rounded up to the nearest power of two. Rounding up is 
optional, but recommended for CRUSH to evenly balance the number of objects 
among placement groups.

How many data pools are being used for storing objects?

'ceph osd dump |grep pool'

Also how are these 90 OSD's laid out across the 8 hosts and is there any 
discrepancy between disk sizes and weight?

'ceph osd tree'

Also what are you using for CRUSH tunables and what Ceph release?

'ceph osd crush show-tunables'
'ceph -v'

Thanks,

----- Original Message -----
From: "Andras Pataki" <apat...@simonsfoundation.org>
To: ceph-users@lists.ceph.com
Sent: Monday, September 21, 2015 2:00:29 PM
Subject: [ceph-users] Uneven data distribution across OSDs

Hi ceph users, 

I am using CephFS for file storage and I have noticed that the data gets 
distributed very unevenly across OSDs. 


    * I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data 
pool with 2 replicas, which is in line with the total PG recommendation if 
“Total PGs = (OSDs * 100) / pool_size” from the docs. 
    * CephFS distributes the data pretty much evenly across the PGs as shown by 
‘ceph pg dump’ 
    * However – the number of PGs assigned to various OSDs (per weight 
unit/terabyte) varies quite a lot. The fullest OSD has as many as 44 PGs per 
terabyte (weight unit), while the emptier ones have as few as 19 or 20. 
    * Even if I consider the total number of PGs for all pools per OSD, the 
number varies similarly wildly (as with the cephfs_data pool only). 
As a result, when the whole CephFS file system is at 60% full, some of the OSDs 
already reach the 95% full condition, and no more data can be written to the 
system. 
Is there any way to force a more even distribution of PGs to OSDs? I am using 
the default crush map, with two levels (root/host). Can any changes to the 
crush map help? I would really like to be get higher disk utilization than 60% 
without 1 of 90 disks filling up so early. 

Thanks, 

Andras 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to