Re: [ceph-users] Fixing inconsistent placement groups

Aaron Ten Clay Mon, 16 Jun 2014 11:11:41 -0700

I would also like to see Ceph get smarter about inconsistent PGs. If we
can't automate the repair, at least the "ceph pg repair" command should
figure out which copy is correct and use that, instead of overwriting all
OSDs with whatever the primary has.


Is it impossible to get the expected CRC out of Ceph so I can detect which
object is correct, instead of looking at the contents or comparing copies
from multiple OSDs?


On Mon, Jun 16, 2014 at 10:56 AM, Gregory Farnum <g...@inktank.com> wrote:

> On Mon, Jun 16, 2014 at 7:13 AM, Markus Blank-Burian <bur...@muenster.de>
> wrote:
> > I am also having inconsistent PGs (running ceph v0.80.1), where some
> > objects are missing. Excerpt from the logs (many similar lines):
> > "0.7f1 shard 66 missing a32857f1/10000129786.00000000/head//0"
>
> Shard...66? Really, that's what it says? Can you copy a few lines of the
> output?
>
>
> > The primary PG and one copy only have 453MB data of the PG, but a
> > third copy exists with 3.1GB data. The referenced objects (identified
> > by filename) are also present on another third OSD. First try: Move
> > "0.7f1_head" to a backup directory on both first and second OSD. This
> > resulted in a same 453MB copy with missing objects on the primary OSD.
> > Shouldn't all the data be copied automatically?
> >
> > So i tried to copy the whole PG directory "0.7f1_head" from the third
> > OSD to the primary. This results the following assert:
> > 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd:     -2> 2014-06-16
> > 15:49:29.046925 7f2e86b93780 10 osd.1 197813 pgid 0.7f1 coll
> > 0.7f1_head
> > 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd:     -1> 2014-06-16
> > 15:49:29.047033 7f2e86b93780 10 filestore(/local/ceph)
> > collection_getattr /local/ceph/current/0.7f1_head 'info' = -61
> > 2014-06-16T15:49:29+02:00 kaa-96 ceph-osd:      0> 2014-06-16
> > 15:49:29.048966 7f2e86b93780 -1 osd/PG.cc: In function 'static epoch_t
> > PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> > ceph::bufferlist*)' thread 7f2e86b93780 time 2014-06-16
> > 15:49:29.047045
> > osd/PG.cc: 2559: FAILED assert(r > 0)
> >
> >  ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
> >  1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> > ceph::buffer::list*)+0x48d) [0x742a8b]
> >  2: (OSD::load_pgs()+0xda3) [0x64c419]
> >  3: (OSD::init()+0x780) [0x64e9ce]
> >  4: (main()+0x25d9) [0x602cbf]
> >
> > Am i missing something?
>
> This may be tangling up some of the other issues you're seeing, but it
> looks like you didn't preserve xattrs (at least on the directory).
>
>
> > And wouldn't it be relatively easy to
> > implement an option to "pg repair" to choose a backup OSD as source
> > instead of the primary OSD?
>
> Umm, maybe. Tickets welcome!
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> >
> > It is still unclear, where these inconsistencies (i.e. missing objects
> > / empty directories) result from, see also:
> > http://tracker.ceph.com/issues/8532.
> >
> > On Fri, Jun 13, 2014 at 4:58 AM, Gregory Farnum <g...@inktank.com>
> wrote:
> >> The OSD should have logged the identities of the inconsistent objects
> >> to the central log on the monitors, as well as to its own local log
> >> file. You'll need to identify for yourself which version is correct,
> >> which will probably involve going and looking at them inside each
> >> OSD's data store. If the primary is correct for all the objects in a
> >> PG, you can just run repair; otherwise you'll want to copy the
> >> replica's copy to the primary. Sorry. :/
> >> (If you have no way of checking yourself which is correct, and you
> >> have more than 2 replicas, you can compare the stored copies and just
> >> take the one held by the majority — that's probably correct.)
> >> -Greg
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>
> >>
> >> On Thu, Jun 12, 2014 at 7:27 PM, Aaron Ten Clay <aaro...@aarontc.com>
> wrote:
> >>> I'm having trouble finding a concise set of steps to repair
> inconsistent
> >>> placement groups. I know from other threads that issuing a 'ceph pg
> repair
> >>> ...' command could cause loss of data integrity if the primary OSD
> happens
> >>> to have the bad copy of the placement group. I know how to find which
> PG's
> >>> are bad (ceph pg dump), but I'm not sure how to figure out which
> objects in
> >>> the PG failed their CRCs during the deep scrub, and I'm not sure how
> to get
> >>> the correct CRC so I can determine which OSD holds the correct copy.
> >>>
> >>> Maybe I'm on the wrong path entirely? If someone knows how to resolve
> this,
> >>> I'd appreciate some insight. I think this would be a good topic for
> adding
> >>> to the OSD/PG operations section of the manual, or at least a wiki
> article.
> >>>
> >>> Thanks!
> >>> -Aaron
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Aaron Ten Clay
http://www.aarontc.com/

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fixing inconsistent placement groups

Reply via email to