The OSD should have logged the identities of the inconsistent objects to the central log on the monitors, as well as to its own local log file. You'll need to identify for yourself which version is correct, which will probably involve going and looking at them inside each OSD's data store. If the primary is correct for all the objects in a PG, you can just run repair; otherwise you'll want to copy the replica's copy to the primary. Sorry. :/ (If you have no way of checking yourself which is correct, and you have more than 2 replicas, you can compare the stored copies and just take the one held by the majority — that's probably correct.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Jun 12, 2014 at 7:27 PM, Aaron Ten Clay <aaro...@aarontc.com> wrote: > I'm having trouble finding a concise set of steps to repair inconsistent > placement groups. I know from other threads that issuing a 'ceph pg repair > ...' command could cause loss of data integrity if the primary OSD happens > to have the bad copy of the placement group. I know how to find which PG's > are bad (ceph pg dump), but I'm not sure how to figure out which objects in > the PG failed their CRCs during the deep scrub, and I'm not sure how to get > the correct CRC so I can determine which OSD holds the correct copy. > > Maybe I'm on the wrong path entirely? If someone knows how to resolve this, > I'd appreciate some insight. I think this would be a good topic for adding > to the OSD/PG operations section of the manual, or at least a wiki article. > > Thanks! > -Aaron > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com