Hi, I am facing a strange behaviour where a pool is stucked, I have no idea how this pool appear in the cluster in the way I have not played with pool creation, *yet*.
##### root@node1:~# ceph -s cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3 health HEALTH_WARN 333 pgs degraded; 333 pgs stuck unclean; pool .rgw.buckets has too few pgs monmap e1: 1 mons at {node1=127.0.0.1:6789/0}, election epoch 1, quorum 0 node1 osdmap e154: 3 osds: 3 up, 3 in pgmap v16812: 3855 pgs, 14 pools, 41193 MB data, 24792 objects 57236 MB used, 644 GB / 738 GB avail 3522 active+clean 333 active+degraded ##### root@node1:/etc/ceph# ceph osd dump epoch 154 fsid 1b147882-722c-43d8-8dfb-38b78d9fbec3 created 2014-04-16 20:46:46.516403 modified 2014-04-18 12:14:29.052231 flags pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 1 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 pool 2 'rbd' rep size 1 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 pool 3 '.rgw.root' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 16 owner 0 pool 4 '.rgw.control' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 18 owner 0 pool 5 '.rgw' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 20 owner 0 pool 6 '.rgw.gc' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 21 owner 0 pool 7 '.users.uid' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 22 owner 0 pool 8 '.users' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 26 owner 0 pool 9 '.users.swift' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 28 owner 0 pool 10 '.users.email' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 56 owner 0 pool 11 '.rgw.buckets.index' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 58 owner 18446744073709551615 pool 12 '.rgw.buckets' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 60 owner 18446744073709551615 pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 333 pgp_num 333 last_change 146 owner 18446744073709551615 max_osd 5 osd.0 up in weight 1 up_from 151 up_thru 151 down_at 148 last_clean_interval [144,147) 192.168.1.18:6800/26681 192.168.1.18:6801/26681 192.168.1.18:6802/26681 192.168.1.18:6803/26681 exists,up f6f63e8a-42af-4dda-b523-ffb835165420 osd.1 up in weight 1 up_from 149 up_thru 149 down_at 148 last_clean_interval [139,147) 192.168.1.18:6805/26685 192.168.1.18:6806/26685 192.168.1.18:6807/26685 192.168.1.18:6808/26685 exists,up fa4689ac-e0ca-4ec3-ab2a-6afa57cc7498 osd.2 up in weight 1 up_from 153 up_thru 153 down_at 148 last_clean_interval [141,147) 192.168.1.18:6810/26691 192.168.1.18:6811/26691 192.168.1.18:6812/26691 192.168.1.18:6813/26691 exists,up 6b2f7e3f-619c-4922-bdf9-bb0f2eee7413 ##### root@node1:/etc/ceph# ceph pg dump_stuck unclean |sort 13.0 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:28.438523 0'0 154:13 [0] [0] 0'0 2014-04-18 11:12:05.322855 0'0 2014-04-18 11:12:05.322855 13.100 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:26.110633 0'0 154:13 [0] [0] 0'0 2014-04-18 11:12:06.318159 0'0 2014-04-18 11:12:06.318159 13.10 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:37.081087 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:05.642317 0'0 2014-04-18 11:12:05.642317 13.1 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:20.874829 0'0 154:13 [1] [1] 0'0 2014-04-18 11:12:05.580874 0'0 2014-04-18 11:12:05.580874 13.101 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:16.723100 0'0 154:14 [1] [1] 0'0 2014-04-18 11:12:06.540975 0'0 2014-04-18 11:12:06.540975 13.102 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.795491 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.543846 0'0 2014-04-18 11:12:06.543846 13.103 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.809492 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.561542 0'0 2014-04-18 11:12:06.561542 13.104 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.817750 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.569706 0'0 2014-04-18 11:12:06.569706 13.105 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.840668 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.602826 0'0 2014-04-18 11:12:06.602826 [...] 13.f7 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:16.990648 0'0 154:14 [1] [1] 0'0 2014-04-18 11:12:06.483859 0'0 2014-04-18 11:12:06.483859 13.f8 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.947686 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.481459 0'0 2014-04-18 11:12:06.481459 13.f9 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.961392 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.505039 0'0 2014-04-18 11:12:06.505039 13.fa 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:17.062254 0'0 154:14 [1] [1] 0'0 2014-04-18 11:12:06.493605 0'0 2014-04-18 11:12:06.493605 13.fb 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:17.058748 0'0 154:14 [1] [1] 0'0 2014-04-18 11:12:06.526013 0'0 2014-04-18 11:12:06.526013 13.fc 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:26.277414 0'0 154:13 [0] [0] 0'0 2014-04-18 11:12:06.243714 0'0 2014-04-18 11:12:06.243714 13.fd 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:26.312618 0'0 154:13 [0] [0] 0'0 2014-04-18 11:12:06.263824 0'0 2014-04-18 11:12:06.263824 13.fe 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:35.977273 0'0 154:12 [2] [2] 0'0 2014-04-18 11:12:06.511879 0'0 2014-04-18 11:12:06.511879 13.ff 0 0 0 0 0 0 0 active+degraded 2014-04-18 12:14:26.262810 0'0 154:13 [0] [0] 0'0 2014-04-18 11:12:06.289603 0'0 2014-04-18 11:12:06.289603 pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up acting last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp ##### root@node1:~# rados df pool name category KB objects clones degraded unfound rd rd KB wr wr KB - 0 0 0 0 0 0 0 0 0 .rgw - 1 5 0 0 0 31 23 17 6 .rgw.buckets - 42182267 24733 0 0 0 4485 17420 163372 50559394 .rgw.buckets.index - 0 3 0 0 0 47113 105894 44735 0 .rgw.control - 0 8 0 0 0 0 0 0 0 .rgw.gc - 0 32 0 0 0 7114 7704 8524 0 .rgw.root - 1 3 0 0 0 16 10 3 3 .users - 1 2 0 0 0 0 0 2 2 .users.email - 1 1 0 0 0 0 0 1 1 .users.swift - 1 2 0 0 0 5 3 2 2 .users.uid - 1 3 0 0 0 52 46 16 6 data - 0 0 0 0 0 0 0 0 0 metadata - 0 0 0 0 0 0 0 0 0 rbd - 0 0 0 0 0 0 0 0 0 total used 58610648 24792 total avail 676160692 total space 774092940 The pool seams empty, so I have tried to removed it but the command complain about the empty name. The last modification that have been done was changing the "osd pool default size" in ceph.conf from 1 to 2 and restart the whole cluster (mon + osd), AFAICR the cluster was healtly before doing that. This is a small bed test so every thing can be trashed, but I am still a bit curious of what happens and how it could be fixed ? Cheers -- Cédric
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com