Re: [ceph-users] Hit suicide timeout on osd start

Игорь Лукьянов Thu, 12 Sep 2013 13:28:28 -0700

To be more precise, not only debug_ms was changed from 0 to 1, debug_osd
was also changed from 0 to 50.
Seems like a race condition somewhere under the hood, that had been
concealed by logging delays and was revealed only after excluding those
delays (by setting debug ms/osd to 0).



2013/9/12 Andrey Korolyov <and...@xdel.ru>

> A little follow-up:
>
> One of cluster nodes(from not-yet-restarted set) went in some kind of
> flapping state exposing cpu consumption peaks and latency spikes every
> 50 seconds. Even more interesting thing was that when we injected
> non-zero debug_ms latency spikes had gone away, but cpu ones remains
> as well. At the picture[0] below we had injected debug_ms 1 and log
> file as /dev/null at the 19:03 and set it back to 0 at 19:13.
>
> 0. http://i.imgur.com/8BBWM7o.png
>
>
> On Wed, Sep 11, 2013 at 5:05 AM, Andrey Korolyov <and...@xdel.ru> wrote:
> > Hello,
> >
> > Got so-famous error on 0.61.8, just for little disk overload on OSD
> > daemon start. I currently have very large metadata per osd (about
> > 20G), this may be an issue.
> >
> > #0  0x00007f2f46adeb7b in raise () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #1  0x0000000000860469 in reraise_fatal (signum=6) at
> > global/signal_handler.cc:58
> > #2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
> > #3  <signal handler called>
> > #4  0x00007f2f44b45405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #5  0x00007f2f44b48b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #6  0x00007f2f4544389d in __gnu_cxx::__verbose_terminate_handler() ()
> > from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #7  0x00007f2f45441996 in ?? () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #8  0x00007f2f454419c3 in std::terminate() () from
> > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #9  0x00007f2f45441bee in __cxa_throw () from
> > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > #10 0x000000000090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1
> > "0 == \"hit suicide timeout\"", file=<optimized out>, line=79,
> >     func=0xa38c60 "bool
> > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
> > time_t)") at common/assert.cc:77
> > #11 0x000000000087914b in ceph::HeartbeatMap::_check
> > (this=this@entry=0x26560e0, h=<optimized out>, who=who@entry=0xa38b40
> > "is_healthy",
> >     now=now@entry=1378860192) at common/HeartbeatMap.cc:79
> > #12 0x0000000000879956 in ceph::HeartbeatMap::is_healthy
> > (this=this@entry=0x26560e0) at common/HeartbeatMap.cc:130
> > #13 0x0000000000879f08 in ceph::HeartbeatMap::check_touch_file
> > (this=0x26560e0) at common/HeartbeatMap.cc:141
> > #14 0x00000000009189f5 in CephContextServiceThread::entry
> > (this=0x2652200) at common/ceph_context.cc:68
> > #15 0x00007f2f46ad6e9a in start_thread () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #16 0x00007f2f44c013dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > #17 0x0000000000000000 in ?? ()
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hit suicide timeout on osd start

Reply via email to