[ceph-users] pgs stuck in 'incomplete' state, blocked ops, query command hangs

Lincoln Bryant Tue, 21 Oct 2014 07:41:25 -0700

Hi cephers,

We have two pgs that are stuck in 'incomplete' state across two different 
pools: 
pg 2.525 is stuck inactive since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is stuck inactive since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is stuck unclean since forever, current state incomplete, last acting 
[55,89]
pg 2.525 is stuck unclean since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is incomplete, acting [55,89]
pg 2.525 is incomplete, acting [55,89]


Basically, we ran into a problem where we had 2x replication and 2 disks on 
different machines died near-simultaneously, and my pgs were stuck in 
'down+peering'. I had to do some combination of declaring the OSDs as lost, and 
running 'force_create_pg'. I realize the data on those pgs is now lost, but I'm 
stuck as to how to get the pgs out of 'incomplete'. 

I also see many ops blocked on the primary OSD for these:
100 ops are blocked > 67108.9 sec
100 ops are blocked > 67108.9 sec on osd.55

However, this is a new disk. If I 'ceph osd out osd.55', the pgs move to 
another OSD and the new primary gets blocked ops. Restarting osd.55 does 
nothing. Other pgs on osd.55 seem okay.

I would attach the result of a query, but If I run a 'ceph pg 2.525 query', the 
command totally hangs until I ctrl-c

ceph pg 2.525 query
^CError EINTR: problem getting command descriptions from pg.2.525

I've also tried 'ceph pg repair 2.525', which does nothing.

Any thoughts here? Are my pools totally sunk? 

Thanks,
Lincoln
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pgs stuck in 'incomplete' state, blocked ops, query command hangs

Reply via email to