Sorry, to set things in context, I had some other problems last weekend. Setting it to optimal tunables helped (although I am on the older kernel). Since it worked, I was inclined to believed that the tunables do work on the older kernel.
That being said, I will upgrade the kernel to see if this issue goes away. Regards, Wai Peng On Tue, Jun 4, 2013 at 12:01 PM, YIP Wai Peng <yi...@comp.nus.edu.sg> wrote: > Hi Sage, > > It is on optimal tunables already. However, I'm on kernel > 2.6.32-358.6.2.el6.x86_64. Will the tunables take effect or do I have to > upgrade to something newer? > > - WP > > > On Tue, Jun 4, 2013 at 11:58 AM, Sage Weil <s...@inktank.com> wrote: > >> On Tue, 4 Jun 2013, YIP Wai Peng wrote: >> > Hi all, >> > I'm running ceph on CentOS6 on 3 hosts, with 3 OSD each (total 9 OSD). >> > When I increased one of my pool rep size from 2 to 3, just 6 PGs will >> get >> > stuck in active+clean+degraded mode, but it doesn't create new replicas. >> >> My first guess is that you do not have the newer crush tunables set and >> some placements are not quite right. If you are prepared for some data >> migration, and are not using an older kernel client, try >> >> ceph osd crush tunables optimal >> >> sage >> >> >> > >> > One of the problematic PG has the following (snipped for brevity) >> > >> > { "state": "active+clean+degraded", >> > "epoch": 1329, >> > "up": [ >> > 4, >> > 6], >> > "acting": [ >> > 4, >> > 6], >> > <snip> >> > "recovery_state": [ >> > { "name": "Started\/Primary\/Active", >> > "enter_time": "2013-06-04 01:10:30.092977", >> > "might_have_unfound": [ >> > { "osd": 3, >> > "status": "already probed"}, >> > { "osd": 5, >> > "status": "not queried"}, >> > { "osd": 6, >> > "status": "already probed"}], >> > <snip> >> > >> > >> > I tried force_create_pg but it gets stuck in "creating". Any ideas on >> how to >> > "kickstart" this node to create the correct numbers of replicas? >> > >> > >> > PS: I have the following crush rule for the pool, which makes the >> replicas >> > go to different hosts. >> > host1 has OSD 0,1,2 >> > host2 has OSD 3,4,5 >> > host3 has OSD 6,7,8 >> > Looking at it, the new replica should be going to OSD 0,1,2, but ceph >> is not >> > creating it? >> > >> > rule different_host { >> > ruleset 3 >> > type replicated >> > min_size 1 >> > max_size 10 >> > step take default >> > step chooseleaf firstn 0 type host >> > step emit >> > } >> > >> > >> > Any help will be much appreciated. Cheers >> > - Wai Peng >> > >> > >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com