[ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

David Moreau Simard Wed, 13 Aug 2014 20:39:07 -0700

Hi,

Trying to update my continuous integration environment.. same deployment method 
with the following specs:
- Ubuntu Precise, Kernel 3.2, Emperor (0.72.2) - Yields a successful, healthy 
cluster.
- Ubuntu Trusty, Kernel 3.13, Firefly (0.80.5) - I have stuck placement groups.


Here’s some relevant bits from the Trusty/Firefly setup before I move on to 
what I’ve done/tried:
http://pastebin.com/eqQTHcxU <— This was about halfway through PG healing.

So, the setup is three monitors, two other hosts on which there are 9 OSDs each.
At the beginning, all my placement groups were stuck unclean.

I tried the easy things first:
- set crush tunables to optimal
- run repairs/scrub on OSDs
- restart OSDs

Nothing happened. All ~12000 PGs remained stuck unclean since forever 
active+remapped.
Next, I played with the crush map. I deleted the default replicated_ruleset 
rule and created a (basic) rule for each pool for the time being.
I set the pools to use their respective rule and also reduced their size to 2 
and min_size to 1.

Still nothing, all PGs stuck.
I’m not sure why but I tried setting the crush tunables to legacy - I guess in 
a trial and error attempt.

Half my PGs healed almost immediately. 6082 PGs remained in active+remapped.
I try running scrubs/repairs - it won’t heal the other half. I set the tunables 
back to optimal, still nothing.

I set tunables to legacy again and most of them end up healing with only 1335 
left in active+remapped.

The remainder of the PGs healed when I restarted the OSDs.

Does anyone have a clue why this happened ?
It looks like switching back and forth between tunables fixed the stuck PGs ?

I can easily reproduce this if anyone wants more info.

Let me know !
--
David Moreau Simard

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

Reply via email to