Re: [ceph-users] Not recovering completely on OSD failure

2013-11-08 Thread Gregory Farnum
This is probably a result of some difficulties that CRUSH has when using pool sizes equal to the total number of buckets it can choose from. We made some changes to the algorithm earlier this year to deal with it, but if using a kernel client you need a very new one to be compatible so we haven't e

[ceph-users] Not recovering completely on OSD failure

2013-11-08 Thread Niklas Goerke
Hi guys This is probably a configuration error, but I just can't find it. The following reproduceable happens on my cluster [1]. 15:52:15 On Host1 one disk is being removed on the RAID Controller (to ceph it looks as if the disk died) 15:52:52 OSD Reported missing (osd.47) 15:52:53 osdmap eXXX