Thanks!  I tried restarting osd.11 (the primary osd for the incomplete pg) and
that helped a LOT.   We went from 0/1 op/s to 10-800+ op/s!

We still have "HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck 
unclean", but at least we can
use our cluster :-)

ceph pg dump_stuck inactive
ok
pg_stat objects mip     degr    unf     bytes   log     disklog state   
state_stamp     v       reported        up acting  last_scrub      scrub_stamp  
   last_deep_scrub deep_scrub_stamp
2.1f6   118     0       0       0       403118080       0       0       
incomplete      2013-07-30 06:08:18.883179 11127'11658123  12914'1506      
[11,9]  [11,9]  10321'11641837  2013-07-28 00:59:09.552640      10321'11641837

Thanks again!
                Jeff


On Tue, Jul 30, 2013 at 11:44:58AM +0200, Jens Kristian S?gaard wrote:
> Hi,
>
>> This is the same issue as yesterday, but I'm still searching for a  
>> solution.  We have a lot of data on the cluster that we need and can't  
>>    health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs 
>
> I'm not claiming to have an answer, but I have a suggestion you can try.
>
> Try running "ceph pg dump" to list all the pgs. Grep for ones that are  
> inactive / incomplete. Note which osds they are on - it is listed in the  
> square brackets with the primary being the first in the list.
>
> Now try restarting the primary osd for the stuck pg and see if that  
> could possible shift things into place.
>
> -- 
> Jens Kristian S?gaard, Mermaid Consulting ApS,
> j...@mermaidconsulting.dk,
> http://www.mermaidconsulting.com/

-- 
===============================================================================
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to