[ceph-users] heartbeat_map is_healthy had timed out after 15

Craig Lewis Wed, 02 Apr 2014 17:01:03 -0700

I'm seeing one OSD spamming it's log with

2014-04-02 16:49:21.547339 7f5cc6c5d700 1 heartbeat_map is_healthy'OSD::op_tp thread 0x7f5cc3456700' had timed out after 15

It starts about 30 seconds after the OSD daemon is started. Itcontinues until2014-04-02 16:48:57.526925 7f0e5a683700 1 heartbeat_map is_healthy'OSD::op_tp thread 0x7f0e3c857700' had suicide timed out after 1502014-04-02 16:48:57.528008 7f0e5a683700 -1 common/HeartbeatMap.cc: Infunction 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,const char*, time_t)' thread 7f0e5a683700 time 2014-04-02 16:48:57.526948

common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

I tried bumping up logging, and I don't see anything interesting. Itried strace, and all I can really see is that the OSD spends a lot oftime in FUTEX_WAIT.

This OSD has been flapping for several days now. None of the other OSDsare having this issue.I thought it might be similiar to Quenten Grasso's post about 'OSDRestarts cause excessively high load average and "requests are blocked >32 sec"'. At first it looks similiar, but Quenten said his OSDseventually settle down. Mine never does.

Can I increase that 15 second timeout, to see if it just needsadditional time? I don't see anything in the ceph docs about this.

Otherwise, I'm pretty close to removing the disk, zapping it, and add itback to the cluster. Any other suggestions?


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] heartbeat_map is_healthy had timed out after 15

Reply via email to