Hello

I don't really find any hardware problems. I have done disk checks and
looked at log files.

Should the osd fail in a core dump if there are hardware problems ?

All my data seems intact I only have:
HEALTH_ERR 915 pgs are stuck inactive for more than 300 seconds; 915 pgs
down; 915 pgs peering; 915 pgs stuck inactive;
I guess its due to the failing osd.

I guess I could remove the osd and add as a new one, but its always
interesting to know what's actually wrong.

 /Regards Martin

Best Regards / Vänliga Hälsningar
*Martin Wilderoth*
*VD*
Enhagslingan 1B, 187 40 Täby

Direkt: +46 8 473 60 63
Mobil: +46 70 969 09 19
martin.wilder...@linserv.se
www.linserv.se

On 14 July 2016 at 06:14, Brad Hubbard <bhubb...@redhat.com> wrote:

> On Thu, Jul 14, 2016 at 06:06:58AM +0200, Martin Wilderoth wrote:
> >  Hello,
> >
> > I have a ceph cluster where the one osd is failng to start. I have been
> > upgrading ceph to see if the error dissappered. Now I'm running jewel
> but I
> > still get the  error message.
> >
> >     -1> 2016-07-13 17:04:22.061384 7fda4d24e700  1 heartbeat_map
> is_healthy
> > 'OSD::osd_tp thread 0x7fda25dd8700' had suicide timed out after 150
>
> This appears to indicate that an OSD thread pool thread (work queue thread)
> has failed to complete an operation within the 150 second grace period.
>
> The most likely and common cause for this is hardware failure and I would
> therefore suggest you thoroughly check this device and look for indicators
> in
> syslog, dmesg, diagnostics, etc. tat this device may have failed.
>
> --
> HTH,
> Brad
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to