Can you post the output ot

ceph daemon osd.xx config show? (probably as an attachment).

There are several things that I've seen cause it
1) too many PGs but too little degraded objects make it seem "slow" (if you 
just have 2 degraded objects but restarted a host with 10K PGs, it will have to 
scan all the PGs probably)
2) sometimes the process gets stuck when a toofull condition occurs
3) sometimes the process gets stuck for no apparent reason - restarting the 
currently backfilling/recovering OSDs fixes it
setting osd_recovery_threads sometimes fixes both 2) and 3), but usually not
4) setting recovery_delay_start to anything > 0 makes recovery slow (even 
0.0000001 makes it much slower than simple 0). On the other hand we had to set 
it high as a default because of slow ops when restarting OSDs, which was 
partially fixed by this.

Can you see any bottleneck in the system? CPU spinning, disks reading? I don't 
think this is the issue, just make sure it's not something more obvious...

Jan


> On 02 Sep 2015, at 22:34, Bob Ababurko <b...@ababurko.net> wrote:
> 
> When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very 
> long time to rebalance.  I should note that my cluster is slightly unique in 
> that I am using cephfs(shouldn't matter?) and it currently contains about 310 
> million objects.
> 
> The last time I replaced a disk/OSD was 2.5 days ago and it is still 
> rebalancing.  This is on a cluster with no client load.
> 
> The configurations is 5 hosts with 6 x 1TB 7200rpm SATA OSD's & 1 850 Pro SSD 
> which contains the journals for said OSD's.  Thats means 30 OSD's in total.  
> System disk is on its own disk.  I'm also using a backend network with single 
> Gb NIC.  THe rebalancing rate(objects/s) seems to be very slow when it is 
> close to finishing....say <1% objects misplaced.
> 
> It doesn't seem right that it would take 2+ days to rebalance a 1TB disk with 
> no load on the cluster.  Are my expectations off?
> 
> I'm not sure if my pg_num/pgp_num needs to be changed OR the rebalance time 
> is dependent on the number of objects in the pool.  These are thoughts i've 
> had but am not certain are relevant here. 
> 
> $ sudo ceph -v
> ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
> 
> $ sudo ceph -s
> [sudo] password for bababurko:
>     cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79
>      health HEALTH_WARN
>             5 pgs backfilling
>             5 pgs stuck unclean
>             recovery 3046506/676638611 objects misplaced (0.450%)
>      monmap e1: 3 mons at 
> {cephmon01=10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0
>  
> <http://10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0>}
>             election epoch 20, quorum 0,1,2 cephmon01,cephmon02,cephmon03
>      mdsmap e6070: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
>      osdmap e4395: 30 osds: 30 up, 30 in; 5 remapped pgs
>       pgmap v3100039: 2112 pgs, 3 pools, 6454 GB data, 321 Mobjects
>             18319 GB used, 9612 GB / 27931 GB avail
>             3046506/676638611 objects misplaced (0.450%)
>                 2095 active+clean
>                   12 active+clean+scrubbing+deep
>                    5 active+remapped+backfilling
> recovery io 2294 kB/s, 147 objects/s
> 
> $ sudo rados df
> pool name                 KB      objects       clones     degraded      
> unfound           rd        rd KB           wr        wr KB
> cephfs_data       6767569962    335746702            0            0           
> 0      2136834            1    676984208   7052266742
> cephfs_metadata        42738      1058437            0            0           
> 0     16130199  30718800215    295996938   3811963908
> rbd                        0            0            0            0           
> 0            0            0            0            0
>   total used     19209068780    336805139
>   total avail    10079469460
>   total space    29288538240
> 
> $ sudo ceph osd pool get cephfs_data pgp_num
> pg_num: 1024
> $ sudo ceph osd pool get cephfs_metadata pgp_num
> pg_num: 1024
> 
> 
> thanks,
> Bob
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to