> 
> I have 4 physical boxes each running 2 OSD's. I needed to retire one so I set
> the 2 OSD's on it to 'out' and everything went as expected. Then I noticed
> that 'ceph health' was reporting that my crush map had legacy tunables. The
> release notes told me I needed to do 'ceph osd crush tunables optimal' to fix
> this, and I wasn't running any old kernel clients, so I made it so. Shortly 
> after
> that, my OSD's started dying until only one remained. I eventually figured out
> that they would stay up until I started the OSD's on the 'out' node. I hadn't
> made the connection to the tunables until I turned up an old mailing list 
> post,
> but sure enough setting the tunables back to legacy got everything stable
> again. I assume that the churn introduced by 'optimal' resulted in the
> situation where the 'out' node stored the only copy of some data, because
> there were down pgs until I got all the OSD's running again
> 

Forgot to add, on the 'out' node, the following would be logged in the osd 
logfile:

7f5688e59700 -1 osd/PG.cc: In function 'void PG::fulfill_info(pg_shard_t, const 
pg_query_t&, std::pair<pg_shard_t, pg_info_t>&)' thread 7f5688e59700 time 
2014-07-05 21:47:51.595687
osd/PG.cc: 4424: FAILED assert(from == primary)

and in the others when they crashed:

7fdcb9600700 -1 osd/PG.cc: In function 
'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed,
 PG::RecoveryState::RecoveryMachine>::my_context)' thread 7fdcb9600700 time 
2014-07-05 21:14:57.260547
osd/PG.cc: 5307: FAILED assert(0 == "we got a bad state machine event")
(sometimes that would appear in the 'out' node too).

Even after the rebalance is complete and the old node is completely retired,  
with one node down and 2 still running (as a test), I get a very small number 
(0.006%) of "unfound" pg's. This is a bit of a worry...

James

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to