I think this is the issue.  look at ceph health detail you will see that
0.21 and others are orphan:
HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22 pgs
stuck unclean; too many PGs per OSD (456 > max 300)
pg 0.21 is stuck inactive since forever, current state creating, last
acting []
pg 0.7 is stuck inactive since forever, current state creating, last acting
[]
pg 5.2 is stuck inactive since forever, current state creating, last acting
[]
pg 1.7 is stuck inactive since forever, current state creating, last acting
[]
pg 0.34 is stuck inactive since forever, current state creating, last
acting []
pg 0.33 is stuck inactive since forever, current state creating, last
acting []
pg 5.1 is stuck inactive since forever, current state creating, last acting
[]
pg 0.1b is stuck inactive since forever, current state creating, last
acting []
pg 0.32 is stuck inactive since forever, current state creating, last
acting []
pg 1.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.31 is stuck inactive since forever, current state creating, last
acting []
pg 2.0 is stuck inactive since forever, current state creating, last acting
[]
pg 5.7 is stuck inactive since forever, current state creating, last acting
[]
pg 1.0 is stuck inactive since forever, current state creating, last acting
[]
pg 2.2 is stuck inactive since forever, current state creating, last acting
[]
pg 0.16 is stuck inactive since forever, current state creating, last
acting []
pg 0.15 is stuck inactive since forever, current state creating, last
acting []
pg 0.2b is stuck inactive since forever, current state creating, last
acting []
pg 0.3f is stuck inactive since forever, current state creating, last
acting []
pg 0.27 is stuck inactive since forever, current state creating, last
acting []
pg 0.3c is stuck inactive since forever, current state creating, last
acting []
pg 0.3a is stuck inactive since forever, current state creating, last
acting []
pg 0.21 is stuck unclean since forever, current state creating, last acting
[]
pg 0.7 is stuck unclean since forever, current state creating, last acting
[]
pg 5.2 is stuck unclean since forever, current state creating, last acting
[]
pg 1.7 is stuck unclean since forever, current state creating, last acting
[]
pg 0.34 is stuck unclean since forever, current state creating, last acting
[]
pg 0.33 is stuck unclean since forever, current state creating, last acting
[]
pg 5.1 is stuck unclean since forever, current state creating, last acting
[]
pg 0.1b is stuck unclean since forever, current state creating, last acting
[]
pg 0.32 is stuck unclean since forever, current state creating, last acting
[]
pg 1.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.31 is stuck unclean since forever, current state creating, last acting
[]
pg 2.0 is stuck unclean since forever, current state creating, last acting
[]
pg 5.7 is stuck unclean since forever, current state creating, last acting
[]
pg 1.0 is stuck unclean since forever, current state creating, last acting
[]
pg 2.2 is stuck unclean since forever, current state creating, last acting
[]
pg 0.16 is stuck unclean since forever, current state creating, last acting
[]
pg 0.15 is stuck unclean since forever, current state creating, last acting
[]
pg 0.2b is stuck unclean since forever, current state creating, last acting
[]
pg 0.3f is stuck unclean since forever, current state creating, last acting
[]
pg 0.27 is stuck unclean since forever, current state creating, last acting
[]
pg 0.3c is stuck unclean since forever, current state creating, last acting
[]
pg 0.3a is stuck unclean since forever, current state creating, last acting
[]


On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada <al...@alexm.org> wrote:

> That happened also to us, but after moving the OSDs with blocked requests
> out of the cluster it eventually regained health OK.
>
> Running ceph health details should list those OSDs. Do you have any?
> El dia 07/06/2015 16:16, "Marek Dohojda" <mdoho...@altitudedigital.com>
> va escriure:
>
> Thank you.  Unfortunately this won't work because 0.21 is already being
>> creating:
>> ~# ceph pg force_create_pg 0.21
>> pg 0.21 already creating
>>
>>
>> I think, and I am guessing here since I don't know internals that well,
>> that 0.21 started to be created but since its OSD disappear it never
>> finished and it keeps trying.
>>
>> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada <al...@alexm.org> wrote:
>>
>>> Marek Dohojda:
>>>
>>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map
>>>>
>>>> #ceph pg map 0.21
>>>> osdmap e579 pg 0.21 (0.21) -> up [] acting []
>>>>
>>>> #ceph pg dump_stuck stale
>>>> ok
>>>> pg_stat state   up      up_primary      acting  acting_primary
>>>> 0.22    stale+active+clean      [5,1,6] 5       [5,1,6] 5
>>>> 0.1f    stale+active+clean      [2,0,4] 2       [2,0,4] 2
>>>> <reducted for ease of reading>
>>>>
>>>> # ceph osd stat
>>>>      osdmap e579: 14 osds: 14 up, 14 in
>>>>
>>>> If I do
>>>> #ceph pg 0.21 query
>>>>
>>>> The command freezes and never returns any output.
>>>>
>>>> I suspect that the problem is that these PGs were created but the OSD
>>>> that they were initially created under disappeared.  So I believe that I
>>>> should just remove these PGs, but honestly I don’t see how.
>>>>
>>>> Does anybody have any ideas as to what to do next?
>>>>
>>>
>>> ceph pg force_create_pg 0.21
>>>
>>> We've been playing last week with this same scenario: we stopped on
>>> purpose the 3 OSD with the replicas of one PG to find out how it affected
>>> to the cluster and we ended up with a stale PG and 400 requests blocked for
>>> a long time. After trying several commands to get the cluster back the one
>>> that made the difference was force_create_pg and later moving the OSD with
>>> blocked requests out of the cluster.
>>>
>>> Hope that helps,
>>> Alex
>>>
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to