Re: [ceph-users] Cluster unable to finish balancing

Berant Lemmenes Tue, 07 May 2013 18:10:55 -0700

So just a little update... after replacing the original failed drive things
seem to be progressing a little better however I noticed something else
odd. Looking at a 'rados df' it looks like the system thinks that the data
pool has 32 TB of data, this is only a 18TB raw system.

pool name       category                 KB      objects       clones
degraded      unfound           rd        rd KB           wr        wr KB
data            -                32811540110       894927            0
  240445           0            1            0      2720415   4223435021
media_video     -                          1            1            0
       0           0            2            1      2611361   1177389479
metadata        -                     210246        18482            0
    4592           1         6970       561296      1253955     19500149
rbd             -                  330731965        82018            0
   19584           0        26295      1612689     54606042   2127030019
  total used     10915771968       995428
  total avail     6657285104
  total space    17573057072


Any recommendations on how I can sort out why it thinks it has way more
data in that pool than it actually does?

Thanks in advance.
Berant


On Mon, May 6, 2013 at 4:43 PM, Berant Lemmenes <ber...@lemmenes.com> wrote:

> TL;DR
>
> bobtail Ceph cluster unable to finish rebalance after drive failure, usage
> increasing even with no clients connected.....
>
>
> I've been running a test bobtail cluster for a couple of months and it's
> been working great. Last week I had a drive die and rebalance; durring that
> time another OSD crashed. All was still well, however as the second osd had
> just crashed I restarted made sure that it re-entered properly and
> rebalancing continued and then I went to bed.
>
> Waking up in the morning I found 2 OSDs were 100% full and two more were
> almost full. To get out of the situation I decreased the replication size
> from 3 to 2, and then also carefully (I believe carefully enough) remove
> some PGs in order to start things up again.
>
> I got things going again and things appeared to be rebalancing correctly;
> however it got to the point were it stopped at 1420 PGs active+clean and
> the rest were stuck backfilling.
>
> Looking at the PG dump, all of the PGs that were having issues were on
> osd.1. So I stopped it, verified things were continuing to rebalance after
> it was down/out and then formated osd.1's disk and put it back in.
>
> Since then I've not been able to get the cluster back to HEALTHY, due to a
> combination of OSDs dying while recovering (not due to disk failure, just
> crashes) as well as the used space in the cluster increasing abnormally.
>
> Right now I have all the clients disconnected and just the cluster
> rebalancing and the usage is increasing to the point where I have 12TB used
> when I have only < 3TB in cephfs and 2TB in a single RBD image
> (replication size 2). I've since shutdown the cluster so I don't fill it up.
>
> My crushmap is the default, here is the usual suspects. I'm happy to
> provide additional information.
>
> pg dump: http://pastebin.com/LUyu6Z09
>
> ceph osd tree:
> osd.8 is the failed drive (I will be replacing tonight), weight on osd.1
> and osd.6 was done via reweight-by-utilization
>
> # id weight type name up/down reweight
> -1 19.5 root default
> -3 19.5 rack unknownrack
> -2 19.5 host ceph-test
> 0 1.5 osd.0 up 1
> 1 1.5 osd.1 up 0.6027
> 2 1.5 osd.2 up 1
> 3 1.5 osd.3 up 1
> 4 1.5 osd.4 up 1
> 5 2 osd.5 up 1
> 6 2 osd.6 up 0.6676
> 7 2 osd.7 up 1
> 8 2 osd.8 down 0
> 9 2 osd.9 up 1
> 10 2 osd.10 up 1
>
>
> ceph -s:
>
>    health HEALTH_WARN 24 pgs backfill; 85 pgs backfill_toofull; 29 pgs
> backfilling; 40 pgs degraded; 1 pgs recovery_wait; 121 pgs stuck unclean;
> recovery 109306/2091318 degraded (5.227%);  recovering 3 o/s, 43344KB/s; 2
> near full osd(s); noout flag(s) set
>    monmap e2: 1 mons at {a=10.200.200.21:6789/0}, election epoch 1,
> quorum 0 a
>    osdmap e16251: 11 osds: 10 up, 10 in
>     pgmap v3145187: 1536 pgs: 1414 active+clean, 6
> active+remapped+wait_backfill, 10
> active+remapped+wait_backfill+backfill_toofull, 4
> active+degraded+wait_backfill+backfill_toofull, 22
> active+remapped+backfilling, 42 active+remapped+backfill_toofull, 7
> active+degraded+backfilling, 17 active+degraded+backfill_toofull, 1
> active+recovery_wait+remapped, 4
> active+degraded+remapped+wait_backfill+backfill_toofull, 8
> active+degraded+remapped+backfill_toofull, 1 active+clean+scrubbing+deep;
> 31607 GB data, 12251 GB used, 4042 GB / 16293 GB avail; 109306/2091318
> degraded (5.227%);  recovering 3 o/s, 43344KB/s
>    mdsmap e3363: 1/1/1 up {0=a=up:active}
>
> rep size:
> pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 384
> pgp_num 384 last_change 897 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num
> 384 pgp_num 384 last_change 13364 owner 0
> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 384
> pgp_num 384 last_change 13208 owner 0
> pool 4 'media_video' rep size 2 crush_ruleset 0 object_hash rjenkins
> pg_num 384 pgp_num 384 last_change 890 owner 0
>
> ceph.conf:
> [global]
> auth cluster required = cephx
>  auth service required = cephx
> auth client required = cephx
>
> osd pool default size = 3
>  osd pool default min size = 1
>  osd pool default pg num = 366
>  osd pool default pgp num = 366
>
> [osd]
> osd journal size = 1000
> journal_aio = true
>  #osd recovery max active = 10
>
> osd mkfs type = xfs
> osd mkfs options xfs = -f -i size=2048
>  osd mount options xfs = inode64,noatime
>
> [mon.a]
>
> host = ceph01
>  mon addr = 10.200.200.21:6789
>
> [osd.0]
> # 1.5 TB SATA
>  host = ceph01
> devs = /dev/sdc
> weight = 1.5
>
> [osd.1]
> # 1.5 TB SATA
> host = ceph01
> devs = /dev/sdd
>  weight = 1.5
>
> [osd.2]
> # 1.5 TB SATA
> host = ceph01
>  devs = /dev/sdg
> weight = 1.5
>
> [osd.3]
> # 1.5 TB SATA
>  host = ceph01
> devs = /dev/sdj
> weight = 1.5
>
> [osd.4]
> # 1.5 TB SATA
> host = ceph01
> devs = /dev/sdk
>  weight = 1.5
>
> [osd.5]
> # 2 TB SAS
> host = ceph01
>  devs = /dev/sdf
> weight = 2
>
> [osd.6]
> # 2 TB SAS
>  host = ceph01
> devs = /dev/sdh
> weight = 2
>
> [osd.7]
> # 2 TB SAS
> host = ceph01
> devs = /dev/sda
>  weight = 2
>
> [osd.8]
> # 2 TB SAS
> host = ceph01
>  devs = /dev/sdb
> weight = 2
>
> [osd.9]
> # 2 TB SAS
>  host = ceph01
> devs = /dev/sdi
> weight = 2
>
> [osd.10]
> # 2 TB SAS
> host = ceph01
> devs = /dev/sde
>  weight = 2
>
> [mds.a]
> host = ceph01
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster unable to finish balancing

Reply via email to