Re: [ceph-users] two osd stack on peereng after start osd to recovery

Dominik Mostowiec Fri, 28 Jun 2013 14:34:59 -0700

Ver. 0.56.6
Hmm, osd not died, 1 or more pg stack on peereng on it.

Regards
Dominik
On Jun 28, 2013 11:28 PM, "Sage Weil" <s...@inktank.com> wrote:


> On Sat, 29 Jun 2013, Andrey Korolyov wrote:
> > There is almost same problem with the 0.61 cluster, at least with same
> > symptoms. Could be reproduced quite easily - remove an osd and then
> > mark it as out and with quite high probability one of neighbors will
> > be stuck at the end of peering process with couple of peering pgs with
> > primary copy on it. Such osd process seems to be stuck in some kind of
> > lock, eating exactly 100% of one core.
>
> Which version?
> Can you attach with gdb and get a backtrace to see what it is chewing on?
>
> Thanks!
> sage
>
>
> >
> > On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum <g...@inktank.com>
> wrote:
> > > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron <szi...@gmail.com>
> wrote:
> > >> Hi, sorry for late response.
> > >>
> > >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view
> > >>
> > >> Logs in attachment, and on google drive, from today.
> > >>
> > >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view
> > >>
> > >> We have such problem today. And new logs are on google drive with
> today date.
> > >>
> > >> Strange is that problematic osd.71 have about 10-15%, more space used
> > >> then other osd in cluster.
> > >>
> > >> Today in one hour osd.71 fails 3 times in mon log, and after third
> > >> recovery has been stuck, and many 500 errors appears in http layer on
> > >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,
> > >> all from stucked pg, helps, but i run even repair on this osd, just in
> > >> case.
> > >>
> > >> I have some theory, that on this pg is rgw index of objects, or one of
> > >> osd in this pg, have some problems with local filesystem or drive
> > >> bellow (raid controller reports nothing about that), but i do not see
> > >> any problem in system.
> > >>
> > >> How can we find in which pg/osd index of objects in rgw bucket exist
> ??
> > >
> > > You can find the location of any named object by grabbing the OSD map
> > > from the cluster and using the osdmaptool: "osdmaptool <mapfile>
> > > --test-map-object <objname> --pool <poolid>".
> > >
> > > You're not providing any context for your issue though, so we really
> > > can't help. What symptoms are you observing?
> > > -Greg
> > > Software Engineer #42 @ http://inktank.com | http://ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] two osd stack on peereng after start osd to recovery

Reply via email to