Re: [ceph-users] Ceph is Full

Ray Sun Tue, 28 Apr 2015 20:07:46 -0700

After add a new ceph-osd, seems the ceph is back to normal. But there's
still a warning message in ceph health detail. During the previous three
months, there's a OSD node restart very often due to power supply problem.
So I guess maybe it is related to this. But I not quite sure how to fix it,
Please help.


HEALTH_WARN 2 pgs backfilling; 2 pgs degraded; 2 pgs stuck unclean;
recovery 741/129078 objects degraded (0.574%)
pg 3.1b is stuck unclean for 7081304.433870, current state
active+degraded+remapped+backfilling, last acting [0,2]
pg 3.22 is stuck unclean for 7079410.875934, current state
active+degraded+remapped+backfilling, last acting [0,2]
pg 3.22 is active+degraded+remapped+backfilling, acting [0,2]
pg 3.1b is active+degraded+remapped+backfilling, acting [0,2]
recovery 741/129078 objects degraded (0.574%)

[root@node-5e40 mail]# ceph pg dump_stuck unclean
ok
pg_stat objects mip     degr    unf     bytes   log     disklog state
state_stamp     v       reported        up      up_primary      acting
 acting_primary  last_scrub   scrub_stamp     last_deep_scrub
deep_scrub_stamp
3.1b    343     0       401     0       2233819648      10001   10001
active+degraded+remapped+backfilling    2015-04-29 07:04:35.319063
 2963'315257     2963:619403  [1,2,0] 1       [0,2]   0       2261'133643
  2015-02-06 02:36:17.834174      2261'133007     2015-02-04 02:36:14.332644
3.22    319     0       368     0       2111692288      10001   10001
active+degraded+remapped+backfilling    2015-04-29 07:05:00.954251
 2963'320956     2963:414625  [1,0,2] 1       [0,2]   0       2261'176758
  2015-02-06 02:09:41.336774      2261'174335     2015-02-04 02:04:40.745000
"last_active": "2015-02-06 09:43:11.587526",
              "last_clean": "2015-02-06 09:43:11.587526",

Best Regards
-- Ray

On Tue, Apr 28, 2015 at 10:59 PM, Ray Sun <xiaoq...@gmail.com> wrote:

> Sébastien
> ,
> Thanks for your answer, I am a fan of your blog, it really help me a lot.
>
> I found there're two ways to do that:
>
> The first one is use command line, but after I tried
> ceph pg set_full_ratio 0.98
> Seems it not worked.
>
> Then I tried to modify the ceph.conf and add
> mon osd full ratio = 0.98
> But I think this should restart the service, but I really worried about if
> the service can restart successfully.
>
> Please help.
>
> 
> 
>
> Best Regards
> -- Ray
>
> On Tue, Apr 28, 2015 at 10:54 PM, Sebastien Han <
> sebastien....@enovance.com> wrote:
>
>> You can try to push the full ratio a bit further and then delete some
>> objects.
>>
>> > On 28 Apr 2015, at 15:51, Ray Sun <xiaoq...@gmail.com> wrote:
>> >
>> > More detail about ceph health detail
>> > [root@controller ~]# ceph health detail
>> > HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs stuck
>> unclean; recovery 7482/129081 objects degraded (5.796%); 2 full osd(s); 1
>> near full osd(s)
>> > pg 3.8 is stuck unclean for 7067109.597691, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.7d is stuck unclean for 1852078.505139, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.21 is stuck unclean for 7072842.637848, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.22 is stuck unclean for 7070880.213397, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.a is stuck unclean for 7067057.863562, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.7f is stuck unclean for 7067122.493746, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.5 is stuck unclean for 7067088.369629, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.1e is stuck unclean for 7073386.246281, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.19 is stuck unclean for 7068035.310269, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.5d is stuck unclean for 1852078.505949, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.1a is stuck unclean for 7067088.429544, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.1b is stuck unclean for 7072773.771385, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.3 is stuck unclean for 7067057.864514, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.15 is stuck unclean for 7067088.825483, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.11 is stuck unclean for 7067057.862408, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.6d is stuck unclean for 7067083.634454, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.6e is stuck unclean for 7067098.452576, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.c is stuck unclean for 5658116.678331, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.e is stuck unclean for 7067078.646953, current state
>> active+degraded+remapped+backfill_toofull, last acting [2,0]
>> > pg 3.20 is stuck unclean for 7067140.530849, current state
>> active+degraded+remapped+backfill_toofull, last acting [0,2]
>> > pg 3.7d is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.7f is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.6d is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.6e is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.5d is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.20 is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.21 is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.22 is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.1e is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.19 is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.1a is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.1b is active+degraded+remapped+backfill_toofull, acting [0,2]
>> > pg 3.15 is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.11 is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.c is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.e is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.8 is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.a is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.5 is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > pg 3.3 is active+degraded+remapped+backfill_toofull, acting [2,0]
>> > recovery 7482/129081 objects degraded (5.796%)
>> > osd.0 is full at 95%
>> > osd.2 is full at 95%
>> > osd.1 is near full at 93%
>> >
>> > Best Regards
>> > -- Ray
>> >
>> > On Tue, Apr 28, 2015 at 9:43 PM, Ray Sun <xiaoq...@gmail.com> wrote:
>> > Emergency Help!
>> >
>> > One of ceph cluster is full, and ceph -s returns:
>> > [root@controller ~]# ceph -s
>> >     cluster 059f27e8-a23f-4587-9033-3e3679d03b31
>> >      health HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs
>> stuck unclean; recovery 7482/129081 objects degraded (5.796%); 2 full
>> osd(s); 1 near full osd(s)
>> >      monmap e6: 4 mons at {node-5e40.cloud.com=
>> 10.10.20.40:6789/0,node-6670.cloud.com=10.10.20.31:6789/0,node-66c4.cloud.com=10.10.20.36:6789/0,node-fb27.cloud.com=10.10.20.41:6789/0},
>> election epoch 886, quorum 0,1,2,3 node-6670.cloud.com,
>> node-66c4.cloud.com,node-5e40.cloud.com,node-fb27.cloud.com
>> >      osdmap e2743: 3 osds: 3 up, 3 in
>> >             flags full
>> >       pgmap v6564199: 320 pgs, 4 pools, 262 GB data, 43027 objects
>> >             786 GB used, 47785 MB / 833 GB avail
>> >             7482/129081 objects degraded (5.796%)
>> >                  300 active+clean
>> >                   20 active+degraded+remapped+backfill_toofull
>> >
>> > Then I try to remove some volume, and I got:
>> > [root@controller ~]# rbd -p volumes rm
>> volume-c55fd052-212d-4107-a2ac-cf53bfc049be
>> > 2015-04-29 05:31:31.719478 7f5fb82f7760  0 client.4781741.objecter
>> FULL, paused modify 0xe9a9e0 tid 6
>> >
>> > Please give me some tip for this, Thanks a lot.
>> >
>> >
>> > Best Regards
>> > -- Ray
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Cheers.
>> ––––
>> Sébastien Han
>> Cloud Architect
>>
>> "Always give 100%. Unless you're giving blood."
>>
>> Phone: +33 (0)1 49 70 99 72
>> Mail: sebastien....@enovance.com
>> Address : 11 bis, rue Roquépine - 75008 Paris
>> Web : www.enovance.com - Twitter : @enovance
>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph is Full

Reply via email to