
thanks for your answer. In fact I have several different problems, which
I tried to solve separatly :

1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
monitors running.
3) I have 4 old inconsistent PG that I can't repair.

So the status :

   health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
   monmap e7: 5 mons at
 election epoch 2584, quorum 0,1,2,3 a,b,c,e
   osdmap e82502: 50 osds: 48 up, 48 in
    pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
+scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
+scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
137KB/s rd, 1852KB/s wr, 199op/s
   mdsmap e1: 0/0/1 up

The tree :

# id    weight  type name       up/down reweight
-8      14.26   root SSDroot
-27     8               datacenter SSDrbx2
-26     8                       room SSDs25
-25     8                               net SSD188-165-12
-24     8                                       rack SSD25B09
-23     8                                               host lyll
46      2                                                       osd.46  up      
47      2                                                       osd.47  up      
48      2                                                       osd.48  up      
49      2                                                       osd.49  up      
-10     4.26            datacenter SSDrbx3
-12     2                       room SSDs43
-13     2                               net SSD178-33-122
-16     2                                       rack SSD43S01
-17     2                                               host kaino
42      1                                                       osd.42  up      
43      1                                                       osd.43  up      
-22     2.26                    room SSDs45
-21     2.26                            net SSD5-135-138
-20     2.26                                    rack SSD45F01
-19     2.26                                            host taman
44      1.13                                                    osd.44  up      
45      1.13                                                    osd.45  up      
-9      2               datacenter SSDrbx4
-11     2                       room SSDs52
-14     2                               net SSD176-31-226
-15     2                                       rack SSD52B09
-18     2                                               host dragan
40      1                                                       osd.40  up      
41      1                                                       osd.41  up      
-1      33.43   root SASroot
-100    15.9            datacenter SASrbx1
-90     15.9                    room SASs15
-72     15.9                            net SAS188-165-15
-40     8                                       rack SAS15B01
-3      8                                               host brontes
0       1                                                       osd.0   up      
1       1                                                       osd.1   up      
2       1                                                       osd.2   up      
3       1                                                       osd.3   up      
4       1                                                       osd.4   up      
5       1                                                       osd.5   up      
6       1                                                       osd.6   up      
7       1                                                       osd.7   up      
-41     7.9                                     rack SAS15B02
-6      7.9                                             host alim
24      1                                                       osd.24  up      
25      1                                                       osd.25  down    
26      1                                                       osd.26  up      
27      1                                                       osd.27  up      
28      1                                                       osd.28  up      
29      1                                                       osd.29  up      
30      1                                                       osd.30  up      
31      0.9                                                     osd.31  up      
-101    17.53           datacenter SASrbx2
-91     17.53                   room SASs27
-70     1.6                             net SAS188-165-13
-44     0                                       rack SAS27B04
-7      0                                               host bul
-45     1.6                                     rack SAS27B06
-4      1.6                                             host okko
32      0.2                                                     osd.32  up      
33      0.2                                                     osd.33  up      
34      0.2                                                     osd.34  up      
35      0.2                                                     osd.35  up      
36      0.2                                                     osd.36  up      
37      0.2                                                     osd.37  up      
38      0.2                                                     osd.38  up      
39      0.2                                                     osd.39  up      
-71     15.93                           net SAS188-165-14
-42     8                                       rack SAS27A03
-5      8                                               host noburo
8       1                                                       osd.8   up      
9       1                                                       osd.9   up      
18      1                                                       osd.18  up      
19      1                                                       osd.19  up      
20      1                                                       osd.20  up      
21      1                                                       osd.21  up      
22      1                                                       osd.22  up      
23      1                                                       osd.23  up      
-43     7.93                                    rack SAS27A04
-2      7.93                                            host keron
10      0.97                                                    osd.10  up      
11      1                                                       osd.11  up      
12      1                                                       osd.12  up      
13      1                                                       osd.13  up      
14      0.98                                                    osd.14  up      
15      1                                                       osd.15  down    
16      0.98                                                    osd.16  up      
17      1                                                       osd.17  up      

Here I have 2 roots : SSDroot and SASroot. All my OSD/PG problems are on
the SAS branch, and my CRUSH rules use per "net" replication.

The osd.15 have a failling disk since long time, its data was correctly
moved (= OSD was out until the cluster obtain HEALTH_OK).
The osd.25 is a buggy OSD that I can't remove or change : if I balance
it's PG on other OSD, then this others OSD crash. That problem occur
before I loose the osd.19 : OSD was unable to mark that PG as
inconsistent since it was crashing during scrub. For me, all
inconsistencies come from this OSD.
The osd.19 was a failling disk, that I changed.

And the health detail :

HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck inactive;
15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors; noout flag(s)
set; 1 mons down, quorum 0,1,2,3 a,b,c,e
pg 4.5c is stuck inactive since forever, current state incomplete, last
acting [19,30]
pg 8.71d is stuck inactive since forever, current state incomplete, last
acting [24,19]
pg 8.3fa is stuck inactive since forever, current state incomplete, last
acting [19,31]
pg 8.3e0 is stuck inactive since forever, current state incomplete, last
acting [31,19]
pg 8.56c is stuck inactive since forever, current state incomplete, last
acting [19,28]
pg 8.19f is stuck inactive since forever, current state incomplete, last
acting [31,19]
pg 8.792 is stuck inactive since forever, current state incomplete, last
acting [19,28]
pg 4.0 is stuck inactive since forever, current state incomplete, last
acting [28,19]
pg 8.78a is stuck inactive since forever, current state incomplete, last
acting [31,19]
pg 8.23e is stuck inactive since forever, current state incomplete, last
acting [32,13]
pg 8.2ff is stuck inactive since forever, current state incomplete, last
acting [6,19]
pg 8.5e2 is stuck inactive since forever, current state incomplete, last
acting [0,19]
pg 8.528 is stuck inactive since forever, current state incomplete, last
acting [31,19]
pg 8.20f is stuck inactive since forever, current state incomplete, last
acting [31,19]
pg 8.372 is stuck inactive since forever, current state incomplete, last
acting [19,24]
pg 4.5c is stuck unclean since forever, current state incomplete, last
acting [19,30]
pg 8.71d is stuck unclean since forever, current state incomplete, last
acting [24,19]
pg 8.3fa is stuck unclean since forever, current state incomplete, last
acting [19,31]
pg 8.3e0 is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.56c is stuck unclean since forever, current state incomplete, last
acting [19,28]
pg 8.19f is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.792 is stuck unclean since forever, current state incomplete, last
acting [19,28]
pg 4.0 is stuck unclean since forever, current state incomplete, last
acting [28,19]
pg 8.78a is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.23e is stuck unclean since forever, current state incomplete, last
acting [32,13]
pg 8.2ff is stuck unclean since forever, current state incomplete, last
acting [6,19]
pg 8.5e2 is stuck unclean since forever, current state incomplete, last
acting [0,19]
pg 8.528 is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.20f is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.372 is stuck unclean since forever, current state incomplete, last
acting [19,24]
pg 8.792 is incomplete, acting [19,28]
pg 8.78a is incomplete, acting [31,19]
pg 8.71d is incomplete, acting [24,19]
pg 8.5e2 is incomplete, acting [0,19]
pg 8.56c is incomplete, acting [19,28]
pg 8.528 is incomplete, acting [31,19]
pg 8.3fa is incomplete, acting [19,31]
pg 8.3e0 is incomplete, acting [31,19]
pg 8.372 is incomplete, acting [19,24]
pg 8.2ff is incomplete, acting [6,19]
pg 8.23e is incomplete, acting [32,13]
pg 8.20f is incomplete, acting [31,19]
pg 8.19f is incomplete, acting [31,19]
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 4.5c is incomplete, acting [19,30]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 4.0 is incomplete, acting [28,19]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]
osd.10 is near full at 85%
19 scrub errors
noout flag(s) set
mon.d (rank 4) addr is down (out of quorum)

Pools 4 and 8 have only 2 replica, and pool 3 have 3 replica but
inconsistent data.

Thanks in advance.

Le vendredi 17 mai 2013 à 00:14 -0700, John Wilkins a écrit :
> If you can follow the documentation here:
> http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/  and
> http://ceph.com/docs/master/rados/troubleshooting/  to provide some
> additional information, we may be better able to help you.
> For example, "ceph osd tree" would help us understand the status of
> your cluster a bit better.
> On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet <ceph.l...@daevel.fr> 
> wrote:
> > Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
> >> Hi,
> >>
> >> I have some PG in state down and/or incomplete on my cluster, because I
> >> loose 2 OSD and a pool was having only 2 replicas. So of course that
> >> data is lost.
> >>
> >> My problem now is that I can't retreive a "HEALTH_OK" status : if I try
> >> to remove, read or overwrite the corresponding RBD images, near all OSD
> >> hang (well... they don't do anything and requests stay in a growing
> >> queue, until the production will be done).
> >>
> >> So, what can I do to remove that corrupts images ?
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> > Up. Nobody can help me on that problem ?
> >
> > Thanks,
> >
> > Olivier
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -- 
> John Wilkins
> Senior Technical Writer
> Intank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com

ceph-users mailing list

Reply via email to