[ceph-users] HEALTH_WARN and PGs out of buckets

Simone Spinelli Sun, 12 Jul 2015 02:44:19 -0700

Dear list,

Our ceph cluster (ceph version 0.87) is stuck in a warning state withsome OSDs out of their original bucket:

health HEALTH_WARN 1097 pgs degraded; 15 pgs peering; 1 pgsrecovering; 1097 pgs stuck degraded; 16 pgs stuck inactive; 26148 pgsstuck unclean; 1096 pgs stuck undersized; 1096 pgs undersized; 4requests are blocked > 32 sec; recovery 101465/6016350 objects degraded(1.686%); 1691712/6016350 objects misplaced (28.119%)monmap e2: 3 mons at{mon1-r2-ser=172.19.14.130:6789/0,mon1-r3-ser=172.19.14.150:6789/0,mon1-rc3-fib=172.19.14.170:6789/0},election epoch 82, quorum 0,1,2 mon1-r2-ser,mon1-r3-ser,mon1-rc3-fib

     osdmap e15358: 144 osds: 143 up, 143 in
      pgmap v12209990: 38816 pgs, 16 pools, 8472 GB data, 1958 kobjects
            25821 GB used, 234 TB / 259 TB avail

101465/6016350 objects degraded (1.686%); 1691712/6016350objects misplaced (28.119%)

                 620 active
               12668 active+clean
                  15 peering
                 395 active+undersized+degraded+remapped
                   1 active+recovering+degraded
               24416 active+remapped
                   1 undersized+degraded
                 700 active+undersized+degraded
  client io 0 B/s rd, 40557 B/s wr, 13 op/s

Yesterday it was just in a warning state with some PG stuck unclean andsome requests blocked. As I restarted one of the OSD involved, arecovery process started and some OSD went down and then up and someothers where put out of their original bucket:


# id    weight  type name       up/down reweight
-1      262.1   root default
-15     80.08           datacenter fibonacci
-16     80.08                   rack rack-c03-fib
............
-35     83.72           datacenter ingegneria
-31     0                       rack rack-01-ing
-32     0                       rack rack-02-ing
-33     0                       rack rack-03-ing
-34     0                       rack rack-04-ing
-18     83.72                   rack rack-03-ser
-13     20.02                           host-high-end cnode1-r3-ser
124     1.82                                    osd.124 up      1
126     1.82                                    osd.126 up      1
128     1.82                                    osd.128 up      1
133     1.82                                    osd.133 up      1
135     1.82                                    osd.135 up      1
…………
145     1.82                                    osd.145 up      1
146     1.82                                    osd.146 up      1
147     1.82                                    osd.147 up      1
148     1.82                                    osd.148 up      1
5       1.82            osd.5   up      1
150     1.82            osd.150 up      1
153     1.82            osd.153 up      1
80      1.82            osd.80  up      1
24      1.82            osd.24  up      1
131     1.82            osd.131 up      1

Now, if I put by hand the OSD in its own bucket it works, but I havesome concerns: why the recovery process is stopped? The cluster isalmost empty so there is space to recover data even without 6 OSD. Didanyone already experience this?

Any advice for what to search?
Any help is appreciated.

Regards
Simone



--
Simone Spinelli <simone.spine...@unipi.it>
Università di Pisa
Settore Rete, Telecomunicazioni e Fonia - Serra
Direzione Edilizia e Telecomunicazioni

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] HEALTH_WARN and PGs out of buckets

Reply via email to