I am also having inconsistent PGs (running ceph v0.80.1), where some objects are missing. Excerpt from the logs (many similar lines): "0.7f1 shard 66 missing a32857f1/10000129786.00000000/head//0"
The primary PG and one copy only have 453MB data of the PG, but a third copy exists with 3.1GB data. The referenced objects (identified by filename) are also present on another third OSD. First try: Move "0.7f1_head" to a backup directory on both first and second OSD. This resulted in a same 453MB copy with missing objects on the primary OSD. Shouldn't all the data be copied automatically? So i tried to copy the whole PG directory "0.7f1_head" from the third OSD to the primary. This results the following assert: 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd: -2> 2014-06-16 15:49:29.046925 7f2e86b93780 10 osd.1 197813 pgid 0.7f1 coll 0.7f1_head 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd: -1> 2014-06-16 15:49:29.047033 7f2e86b93780 10 filestore(/local/ceph) collection_getattr /local/ceph/current/0.7f1_head 'info' = -61 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd: 0> 2014-06-16 15:49:29.048966 7f2e86b93780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' thread 7f2e86b93780 time 2014-06-16 15:49:29.047045 osd/PG.cc: 2559: FAILED assert(r > 0) ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x48d) [0x742a8b] 2: (OSD::load_pgs()+0xda3) [0x64c419] 3: (OSD::init()+0x780) [0x64e9ce] 4: (main()+0x25d9) [0x602cbf] Am i missing something? And wouldn't it be relatively easy to implement an option to "pg repair" to choose a backup OSD as source instead of the primary OSD? It is still unclear, where these inconsistencies (i.e. missing objects / empty directories) result from, see also: http://tracker.ceph.com/issues/8532. On Fri, Jun 13, 2014 at 4:58 AM, Gregory Farnum <g...@inktank.com> wrote: > The OSD should have logged the identities of the inconsistent objects > to the central log on the monitors, as well as to its own local log > file. You'll need to identify for yourself which version is correct, > which will probably involve going and looking at them inside each > OSD's data store. If the primary is correct for all the objects in a > PG, you can just run repair; otherwise you'll want to copy the > replica's copy to the primary. Sorry. :/ > (If you have no way of checking yourself which is correct, and you > have more than 2 replicas, you can compare the stored copies and just > take the one held by the majority — that's probably correct.) > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Thu, Jun 12, 2014 at 7:27 PM, Aaron Ten Clay <aaro...@aarontc.com> wrote: >> I'm having trouble finding a concise set of steps to repair inconsistent >> placement groups. I know from other threads that issuing a 'ceph pg repair >> ...' command could cause loss of data integrity if the primary OSD happens >> to have the bad copy of the placement group. I know how to find which PG's >> are bad (ceph pg dump), but I'm not sure how to figure out which objects in >> the PG failed their CRCs during the deep scrub, and I'm not sure how to get >> the correct CRC so I can determine which OSD holds the correct copy. >> >> Maybe I'm on the wrong path entirely? If someone knows how to resolve this, >> I'd appreciate some insight. I think this would be a good topic for adding >> to the OSD/PG operations section of the manual, or at least a wiki article. >> >> Thanks! >> -Aaron >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com