Hi,

I am facing a strange behaviour where a pool is stucked, I have no idea
how this pool appear in the cluster in the way I have not played with
pool creation, *yet*.

##### root@node1:~# ceph -s
    cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3
     health HEALTH_WARN 333 pgs degraded; 333 pgs stuck unclean; pool
.rgw.buckets has too few pgs
     monmap e1: 1 mons at {node1=127.0.0.1:6789/0}, election epoch 1,
quorum 0 node1
     osdmap e154: 3 osds: 3 up, 3 in
      pgmap v16812: 3855 pgs, 14 pools, 41193 MB data, 24792 objects
            57236 MB used, 644 GB / 738 GB avail
                3522 active+clean
                 333 active+degraded

##### root@node1:/etc/ceph# ceph osd dump
epoch 154
fsid 1b147882-722c-43d8-8dfb-38b78d9fbec3
created 2014-04-16 20:46:46.516403
modified 2014-04-18 12:14:29.052231
flags

pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 1 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 1 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 3 '.rgw.root' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 16 owner 0
pool 4 '.rgw.control' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 18 owner 0
pool 5 '.rgw' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 333 pgp_num 333 last_change 20 owner 0
pool 6 '.rgw.gc' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 21 owner 0
pool 7 '.users.uid' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 22 owner 0
pool 8 '.users' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 26 owner 0
pool 9 '.users.swift' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 28 owner 0
pool 10 '.users.email' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 56 owner 0
pool 11 '.rgw.buckets.index' rep size 1 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 333 pgp_num 333 last_change 58 owner
18446744073709551615
pool 12 '.rgw.buckets' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 60 owner 18446744073709551615
pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 333 pgp_num 333 last_change 146 owner 18446744073709551615

max_osd 5
osd.0 up   in  weight 1 up_from 151 up_thru 151 down_at 148
last_clean_interval [144,147) 192.168.1.18:6800/26681
192.168.1.18:6801/26681 192.168.1.18:6802/26681 192.168.1.18:6803/26681
exists,up f6f63e8a-42af-4dda-b523-ffb835165420
osd.1 up   in  weight 1 up_from 149 up_thru 149 down_at 148
last_clean_interval [139,147) 192.168.1.18:6805/26685
192.168.1.18:6806/26685 192.168.1.18:6807/26685 192.168.1.18:6808/26685
exists,up fa4689ac-e0ca-4ec3-ab2a-6afa57cc7498
osd.2 up   in  weight 1 up_from 153 up_thru 153 down_at 148
last_clean_interval [141,147) 192.168.1.18:6810/26691
192.168.1.18:6811/26691 192.168.1.18:6812/26691 192.168.1.18:6813/26691
exists,up 6b2f7e3f-619c-4922-bdf9-bb0f2eee7413

##### root@node1:/etc/ceph# ceph pg dump_stuck unclean |sort
13.0    0    0    0    0    0    0    0    active+degraded    2014-04-18
12:14:28.438523    0'0    154:13    [0]    [0]    0'0    2014-04-18
11:12:05.322855    0'0    2014-04-18 11:12:05.322855
13.100    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:26.110633    0'0    154:13    [0]    [0]    0'0   
2014-04-18 11:12:06.318159    0'0    2014-04-18 11:12:06.318159
13.10    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:37.081087    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:05.642317    0'0    2014-04-18 11:12:05.642317
13.1    0    0    0    0    0    0    0    active+degraded    2014-04-18
12:14:20.874829    0'0    154:13    [1]    [1]    0'0    2014-04-18
11:12:05.580874    0'0    2014-04-18 11:12:05.580874
13.101    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:16.723100    0'0    154:14    [1]    [1]    0'0   
2014-04-18 11:12:06.540975    0'0    2014-04-18 11:12:06.540975
13.102    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.795491    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.543846    0'0    2014-04-18 11:12:06.543846
13.103    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.809492    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.561542    0'0    2014-04-18 11:12:06.561542
13.104    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.817750    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.569706    0'0    2014-04-18 11:12:06.569706
13.105    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.840668    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.602826    0'0    2014-04-18 11:12:06.602826

[...]

13.f7    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:16.990648    0'0    154:14    [1]    [1]    0'0   
2014-04-18 11:12:06.483859    0'0    2014-04-18 11:12:06.483859
13.f8    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.947686    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.481459    0'0    2014-04-18 11:12:06.481459
13.f9    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.961392    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.505039    0'0    2014-04-18 11:12:06.505039
13.fa    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:17.062254    0'0    154:14    [1]    [1]    0'0   
2014-04-18 11:12:06.493605    0'0    2014-04-18 11:12:06.493605
13.fb    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:17.058748    0'0    154:14    [1]    [1]    0'0   
2014-04-18 11:12:06.526013    0'0    2014-04-18 11:12:06.526013
13.fc    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:26.277414    0'0    154:13    [0]    [0]    0'0   
2014-04-18 11:12:06.243714    0'0    2014-04-18 11:12:06.243714
13.fd    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:26.312618    0'0    154:13    [0]    [0]    0'0   
2014-04-18 11:12:06.263824    0'0    2014-04-18 11:12:06.263824
13.fe    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:35.977273    0'0    154:12    [2]    [2]    0'0   
2014-04-18 11:12:06.511879    0'0    2014-04-18 11:12:06.511879
13.ff    0    0    0    0    0    0    0    active+degraded   
2014-04-18 12:14:26.262810    0'0    154:13    [0]    [0]    0'0   
2014-04-18 11:12:06.289603    0'0    2014-04-18 11:12:06.289603
pg_stat    objects    mip    degr    unf    bytes    log    disklog   
state    state_stamp    v    reported    up    acting    last_scrub   
scrub_stamp    last_deep_scrub    deep_scrub_stamp


##### root@node1:~# rados df
pool name       category                 KB      objects      
clones     degraded      unfound           rd        rd KB          
wr        wr KB
                -                          0            0           
0            0           0            0            0           
0            0
.rgw            -                          1            5           
0            0           0           31           23          
17            6
.rgw.buckets    -                   42182267        24733           
0            0           0         4485        17420       163372    
50559394
.rgw.buckets.index -                          0            3           
0            0           0        47113       105894       
44735            0
.rgw.control    -                          0            8           
0            0           0            0            0           
0            0
.rgw.gc         -                          0           32           
0            0           0         7114         7704        
8524            0
.rgw.root       -                          1            3           
0            0           0           16           10           
3            3
.users          -                          1            2           
0            0           0            0            0           
2            2
.users.email    -                          1            1           
0            0           0            0            0           
1            1
.users.swift    -                          1            2           
0            0           0            5            3           
2            2
.users.uid      -                          1            3           
0            0           0           52           46          
16            6
data            -                          0            0           
0            0           0            0            0           
0            0
metadata        -                          0            0           
0            0           0            0            0           
0            0
rbd             -                          0            0           
0            0           0            0            0           
0            0
  total used        58610648        24792
  total avail      676160692
  total space      774092940


The pool seams empty, so I have tried to removed it but the command
complain about the empty name. The last modification that have been done
was changing the "osd pool default size" in ceph.conf from 1 to 2 and
restart the whole cluster (mon + osd), AFAICR the cluster was healtly
before doing that.

This is a small bed test so every thing can be trashed, but I am still a
bit curious of what happens and how it could be fixed ?

Cheers

-- 
Cédric

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to