So just a little update... after replacing the original failed drive things seem to be progressing a little better however I noticed something else odd. Looking at a 'rados df' it looks like the system thinks that the data pool has 32 TB of data, this is only a 18TB raw system.
pool name category KB objects clones degraded unfound rd rd KB wr wr KB data - 32811540110 894927 0 240445 0 1 0 2720415 4223435021 media_video - 1 1 0 0 0 2 1 2611361 1177389479 metadata - 210246 18482 0 4592 1 6970 561296 1253955 19500149 rbd - 330731965 82018 0 19584 0 26295 1612689 54606042 2127030019 total used 10915771968 995428 total avail 6657285104 total space 17573057072 Any recommendations on how I can sort out why it thinks it has way more data in that pool than it actually does? Thanks in advance. Berant On Mon, May 6, 2013 at 4:43 PM, Berant Lemmenes <ber...@lemmenes.com> wrote: > TL;DR > > bobtail Ceph cluster unable to finish rebalance after drive failure, usage > increasing even with no clients connected..... > > > I've been running a test bobtail cluster for a couple of months and it's > been working great. Last week I had a drive die and rebalance; durring that > time another OSD crashed. All was still well, however as the second osd had > just crashed I restarted made sure that it re-entered properly and > rebalancing continued and then I went to bed. > > Waking up in the morning I found 2 OSDs were 100% full and two more were > almost full. To get out of the situation I decreased the replication size > from 3 to 2, and then also carefully (I believe carefully enough) remove > some PGs in order to start things up again. > > I got things going again and things appeared to be rebalancing correctly; > however it got to the point were it stopped at 1420 PGs active+clean and > the rest were stuck backfilling. > > Looking at the PG dump, all of the PGs that were having issues were on > osd.1. So I stopped it, verified things were continuing to rebalance after > it was down/out and then formated osd.1's disk and put it back in. > > Since then I've not been able to get the cluster back to HEALTHY, due to a > combination of OSDs dying while recovering (not due to disk failure, just > crashes) as well as the used space in the cluster increasing abnormally. > > Right now I have all the clients disconnected and just the cluster > rebalancing and the usage is increasing to the point where I have 12TB used > when I have only < 3TB in cephfs and 2TB in a single RBD image > (replication size 2). I've since shutdown the cluster so I don't fill it up. > > My crushmap is the default, here is the usual suspects. I'm happy to > provide additional information. > > pg dump: http://pastebin.com/LUyu6Z09 > > ceph osd tree: > osd.8 is the failed drive (I will be replacing tonight), weight on osd.1 > and osd.6 was done via reweight-by-utilization > > # id weight type name up/down reweight > -1 19.5 root default > -3 19.5 rack unknownrack > -2 19.5 host ceph-test > 0 1.5 osd.0 up 1 > 1 1.5 osd.1 up 0.6027 > 2 1.5 osd.2 up 1 > 3 1.5 osd.3 up 1 > 4 1.5 osd.4 up 1 > 5 2 osd.5 up 1 > 6 2 osd.6 up 0.6676 > 7 2 osd.7 up 1 > 8 2 osd.8 down 0 > 9 2 osd.9 up 1 > 10 2 osd.10 up 1 > > > ceph -s: > > health HEALTH_WARN 24 pgs backfill; 85 pgs backfill_toofull; 29 pgs > backfilling; 40 pgs degraded; 1 pgs recovery_wait; 121 pgs stuck unclean; > recovery 109306/2091318 degraded (5.227%); recovering 3 o/s, 43344KB/s; 2 > near full osd(s); noout flag(s) set > monmap e2: 1 mons at {a=10.200.200.21:6789/0}, election epoch 1, > quorum 0 a > osdmap e16251: 11 osds: 10 up, 10 in > pgmap v3145187: 1536 pgs: 1414 active+clean, 6 > active+remapped+wait_backfill, 10 > active+remapped+wait_backfill+backfill_toofull, 4 > active+degraded+wait_backfill+backfill_toofull, 22 > active+remapped+backfilling, 42 active+remapped+backfill_toofull, 7 > active+degraded+backfilling, 17 active+degraded+backfill_toofull, 1 > active+recovery_wait+remapped, 4 > active+degraded+remapped+wait_backfill+backfill_toofull, 8 > active+degraded+remapped+backfill_toofull, 1 active+clean+scrubbing+deep; > 31607 GB data, 12251 GB used, 4042 GB / 16293 GB avail; 109306/2091318 > degraded (5.227%); recovering 3 o/s, 43344KB/s > mdsmap e3363: 1/1/1 up {0=a=up:active} > > rep size: > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 384 > pgp_num 384 last_change 897 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num > 384 pgp_num 384 last_change 13364 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 384 > pgp_num 384 last_change 13208 owner 0 > pool 4 'media_video' rep size 2 crush_ruleset 0 object_hash rjenkins > pg_num 384 pgp_num 384 last_change 890 owner 0 > > ceph.conf: > [global] > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > > osd pool default size = 3 > osd pool default min size = 1 > osd pool default pg num = 366 > osd pool default pgp num = 366 > > [osd] > osd journal size = 1000 > journal_aio = true > #osd recovery max active = 10 > > osd mkfs type = xfs > osd mkfs options xfs = -f -i size=2048 > osd mount options xfs = inode64,noatime > > [mon.a] > > host = ceph01 > mon addr = 10.200.200.21:6789 > > [osd.0] > # 1.5 TB SATA > host = ceph01 > devs = /dev/sdc > weight = 1.5 > > [osd.1] > # 1.5 TB SATA > host = ceph01 > devs = /dev/sdd > weight = 1.5 > > [osd.2] > # 1.5 TB SATA > host = ceph01 > devs = /dev/sdg > weight = 1.5 > > [osd.3] > # 1.5 TB SATA > host = ceph01 > devs = /dev/sdj > weight = 1.5 > > [osd.4] > # 1.5 TB SATA > host = ceph01 > devs = /dev/sdk > weight = 1.5 > > [osd.5] > # 2 TB SAS > host = ceph01 > devs = /dev/sdf > weight = 2 > > [osd.6] > # 2 TB SAS > host = ceph01 > devs = /dev/sdh > weight = 2 > > [osd.7] > # 2 TB SAS > host = ceph01 > devs = /dev/sda > weight = 2 > > [osd.8] > # 2 TB SAS > host = ceph01 > devs = /dev/sdb > weight = 2 > > [osd.9] > # 2 TB SAS > host = ceph01 > devs = /dev/sdi > weight = 2 > > [osd.10] > # 2 TB SAS > host = ceph01 > devs = /dev/sde > weight = 2 > > [mds.a] > host = ceph01 >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com