To be more precise, not only debug_ms was changed from 0 to 1, debug_osd was also changed from 0 to 50. Seems like a race condition somewhere under the hood, that had been concealed by logging delays and was revealed only after excluding those delays (by setting debug ms/osd to 0).
2013/9/12 Andrey Korolyov <and...@xdel.ru> > A little follow-up: > > One of cluster nodes(from not-yet-restarted set) went in some kind of > flapping state exposing cpu consumption peaks and latency spikes every > 50 seconds. Even more interesting thing was that when we injected > non-zero debug_ms latency spikes had gone away, but cpu ones remains > as well. At the picture[0] below we had injected debug_ms 1 and log > file as /dev/null at the 19:03 and set it back to 0 at 19:13. > > 0. http://i.imgur.com/8BBWM7o.png > > > On Wed, Sep 11, 2013 at 5:05 AM, Andrey Korolyov <and...@xdel.ru> wrote: > > Hello, > > > > Got so-famous error on 0.61.8, just for little disk overload on OSD > > daemon start. I currently have very large metadata per osd (about > > 20G), this may be an issue. > > > > #0 0x00007f2f46adeb7b in raise () from > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x0000000000860469 in reraise_fatal (signum=6) at > > global/signal_handler.cc:58 > > #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:104 > > #3 <signal handler called> > > #4 0x00007f2f44b45405 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > > #5 0x00007f2f44b48b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6 > > #6 0x00007f2f4544389d in __gnu_cxx::__verbose_terminate_handler() () > > from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > #7 0x00007f2f45441996 in ?? () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > #8 0x00007f2f454419c3 in std::terminate() () from > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > #9 0x00007f2f45441bee in __cxa_throw () from > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > #10 0x000000000090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1 > > "0 == \"hit suicide timeout\"", file=<optimized out>, line=79, > > func=0xa38c60 "bool > > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, > > time_t)") at common/assert.cc:77 > > #11 0x000000000087914b in ceph::HeartbeatMap::_check > > (this=this@entry=0x26560e0, h=<optimized out>, who=who@entry=0xa38b40 > > "is_healthy", > > now=now@entry=1378860192) at common/HeartbeatMap.cc:79 > > #12 0x0000000000879956 in ceph::HeartbeatMap::is_healthy > > (this=this@entry=0x26560e0) at common/HeartbeatMap.cc:130 > > #13 0x0000000000879f08 in ceph::HeartbeatMap::check_touch_file > > (this=0x26560e0) at common/HeartbeatMap.cc:141 > > #14 0x00000000009189f5 in CephContextServiceThread::entry > > (this=0x2652200) at common/ceph_context.cc:68 > > #15 0x00007f2f46ad6e9a in start_thread () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #16 0x00007f2f44c013dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #17 0x0000000000000000 in ?? () >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com