I think this is the issue. look at ceph health detail you will see that 0.21 and others are orphan: HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22 pgs stuck unclean; too many PGs per OSD (456 > max 300) pg 0.21 is stuck inactive since forever, current state creating, last acting [] pg 0.7 is stuck inactive since forever, current state creating, last acting [] pg 5.2 is stuck inactive since forever, current state creating, last acting [] pg 1.7 is stuck inactive since forever, current state creating, last acting [] pg 0.34 is stuck inactive since forever, current state creating, last acting [] pg 0.33 is stuck inactive since forever, current state creating, last acting [] pg 5.1 is stuck inactive since forever, current state creating, last acting [] pg 0.1b is stuck inactive since forever, current state creating, last acting [] pg 0.32 is stuck inactive since forever, current state creating, last acting [] pg 1.2 is stuck inactive since forever, current state creating, last acting [] pg 0.31 is stuck inactive since forever, current state creating, last acting [] pg 2.0 is stuck inactive since forever, current state creating, last acting [] pg 5.7 is stuck inactive since forever, current state creating, last acting [] pg 1.0 is stuck inactive since forever, current state creating, last acting [] pg 2.2 is stuck inactive since forever, current state creating, last acting [] pg 0.16 is stuck inactive since forever, current state creating, last acting [] pg 0.15 is stuck inactive since forever, current state creating, last acting [] pg 0.2b is stuck inactive since forever, current state creating, last acting [] pg 0.3f is stuck inactive since forever, current state creating, last acting [] pg 0.27 is stuck inactive since forever, current state creating, last acting [] pg 0.3c is stuck inactive since forever, current state creating, last acting [] pg 0.3a is stuck inactive since forever, current state creating, last acting [] pg 0.21 is stuck unclean since forever, current state creating, last acting [] pg 0.7 is stuck unclean since forever, current state creating, last acting [] pg 5.2 is stuck unclean since forever, current state creating, last acting [] pg 1.7 is stuck unclean since forever, current state creating, last acting [] pg 0.34 is stuck unclean since forever, current state creating, last acting [] pg 0.33 is stuck unclean since forever, current state creating, last acting [] pg 5.1 is stuck unclean since forever, current state creating, last acting [] pg 0.1b is stuck unclean since forever, current state creating, last acting [] pg 0.32 is stuck unclean since forever, current state creating, last acting [] pg 1.2 is stuck unclean since forever, current state creating, last acting [] pg 0.31 is stuck unclean since forever, current state creating, last acting [] pg 2.0 is stuck unclean since forever, current state creating, last acting [] pg 5.7 is stuck unclean since forever, current state creating, last acting [] pg 1.0 is stuck unclean since forever, current state creating, last acting [] pg 2.2 is stuck unclean since forever, current state creating, last acting [] pg 0.16 is stuck unclean since forever, current state creating, last acting [] pg 0.15 is stuck unclean since forever, current state creating, last acting [] pg 0.2b is stuck unclean since forever, current state creating, last acting [] pg 0.3f is stuck unclean since forever, current state creating, last acting [] pg 0.27 is stuck unclean since forever, current state creating, last acting [] pg 0.3c is stuck unclean since forever, current state creating, last acting [] pg 0.3a is stuck unclean since forever, current state creating, last acting []
On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada <al...@alexm.org> wrote: > That happened also to us, but after moving the OSDs with blocked requests > out of the cluster it eventually regained health OK. > > Running ceph health details should list those OSDs. Do you have any? > El dia 07/06/2015 16:16, "Marek Dohojda" <mdoho...@altitudedigital.com> > va escriure: > > Thank you. Unfortunately this won't work because 0.21 is already being >> creating: >> ~# ceph pg force_create_pg 0.21 >> pg 0.21 already creating >> >> >> I think, and I am guessing here since I don't know internals that well, >> that 0.21 started to be created but since its OSD disappear it never >> finished and it keeps trying. >> >> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada <al...@alexm.org> wrote: >> >>> Marek Dohojda: >>> >>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map >>>> >>>> #ceph pg map 0.21 >>>> osdmap e579 pg 0.21 (0.21) -> up [] acting [] >>>> >>>> #ceph pg dump_stuck stale >>>> ok >>>> pg_stat state up up_primary acting acting_primary >>>> 0.22 stale+active+clean [5,1,6] 5 [5,1,6] 5 >>>> 0.1f stale+active+clean [2,0,4] 2 [2,0,4] 2 >>>> <reducted for ease of reading> >>>> >>>> # ceph osd stat >>>> osdmap e579: 14 osds: 14 up, 14 in >>>> >>>> If I do >>>> #ceph pg 0.21 query >>>> >>>> The command freezes and never returns any output. >>>> >>>> I suspect that the problem is that these PGs were created but the OSD >>>> that they were initially created under disappeared. So I believe that I >>>> should just remove these PGs, but honestly I don’t see how. >>>> >>>> Does anybody have any ideas as to what to do next? >>>> >>> >>> ceph pg force_create_pg 0.21 >>> >>> We've been playing last week with this same scenario: we stopped on >>> purpose the 3 OSD with the replicas of one PG to find out how it affected >>> to the cluster and we ended up with a stale PG and 400 requests blocked for >>> a long time. After trying several commands to get the cluster back the one >>> that made the difference was force_create_pg and later moving the OSD with >>> blocked requests out of the cluster. >>> >>> Hope that helps, >>> Alex >>> >> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com