Hello I don't really find any hardware problems. I have done disk checks and looked at log files.
Should the osd fail in a core dump if there are hardware problems ? All my data seems intact I only have: HEALTH_ERR 915 pgs are stuck inactive for more than 300 seconds; 915 pgs down; 915 pgs peering; 915 pgs stuck inactive; I guess its due to the failing osd. I guess I could remove the osd and add as a new one, but its always interesting to know what's actually wrong. /Regards Martin Best Regards / Vänliga Hälsningar *Martin Wilderoth* *VD* Enhagslingan 1B, 187 40 Täby Direkt: +46 8 473 60 63 Mobil: +46 70 969 09 19 martin.wilder...@linserv.se www.linserv.se On 14 July 2016 at 06:14, Brad Hubbard <bhubb...@redhat.com> wrote: > On Thu, Jul 14, 2016 at 06:06:58AM +0200, Martin Wilderoth wrote: > > Hello, > > > > I have a ceph cluster where the one osd is failng to start. I have been > > upgrading ceph to see if the error dissappered. Now I'm running jewel > but I > > still get the error message. > > > > -1> 2016-07-13 17:04:22.061384 7fda4d24e700 1 heartbeat_map > is_healthy > > 'OSD::osd_tp thread 0x7fda25dd8700' had suicide timed out after 150 > > This appears to indicate that an OSD thread pool thread (work queue thread) > has failed to complete an operation within the 150 second grace period. > > The most likely and common cause for this is hardware failure and I would > therefore suggest you thoroughly check this device and look for indicators > in > syslog, dmesg, diagnostics, etc. tat this device may have failed. > > -- > HTH, > Brad >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com