Hi I am evaluating Icinga2 on a large (3k nodes) cluster and I am facing some issue.
Icinga2 is slowly opening too many files, spawning too many threads and crashing at the end. Here is the stack trace I see in the log: [2015-10-06 00:00:20 +0530] critical/ThreadPool: Exception thrown in event handler: Error: boost::thread_resource_error (0) libboost_thread.so.1.49.0: void boost::throw_exception<boost::thread_resource_error>(boost::thread_resource_error const&) (+0xad) [0x7f7a0124541d] (1) libboost_thread.so.1.49.0: boost::thread::start_thread() (+0xeb) [0x7f7a0123ef8b] (2) libbase.so: boost::thread::thread<boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > > >(boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >, boost::disable_if<boost::is_convertible<boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >&, boost::detail::thread_move_t<boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > > > >, boost::thread::dummy*>::type) (+0x49) [0x7f7a00b9e699] (3) libbase.so: boost::thread* boost::thread_group::create_thread<boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > > >(boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>, boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >) (+0x77) [0x7f7a00b9e737] (4) libbase.so: icinga::WorkQueue::Enqueue(boost::function<void ()> const&, bool) (+0x7a) [0x7f7a00b50c7a] (5) libremote.so: icinga::ApiClient::SendMessage(boost::intrusive_ptr<icinga::Dictionary> const&) (+0x6b) [0x7f7a001ca1ab] (6) libremote.so: icinga::ApiListener::SendConfigUpdate(boost::intrusive_ptr<icinga::ApiClient> const&) (+0x19fa) [0x7f7a001dbcca] (7) libremote.so: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) (+0x2de) [0x7f7a001e2e5e] (8) libbase.so: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x2c1) [0x7f7a00b67a51] (9) libboost_thread.so.1.49.0: <unknown function> (+0x10629) [0x7f7a0123f629] (10) libpthread.so.0: <unknown function> (+0x6b50) [0x7f7a00e1eb50] (11) libc.so.6: clone (+0x6d) [0x7f79fe68c95d] [2015-10-06 00:00:20 +0530] critical/ApiListener: Cannot accept new connection. [2015-10-06 00:00:20 +0530] critical/Socket: accept() failed with error code 24, "Too many open files" At that time, there were 16.4k open files out of which 8.6k are TCP connections and 7.8k are pipes. There were 16.3k threads at that time. Out of the 8.6k tcp connections, 2.4k were in CLOSE_WAIT and remaining were in ESTABLISHED state. There were upto 5 established connections to a single client. The open file limit of the process is 16384 (even though I have set a much larger global limit). The main process doesn't crash. But most of the checks are in Unknown state with error: Error: Function call 'pipe2' failed with error code 24, 'Too many open files' Version: icinga2 2.3.10-1~debmon70+1 OS: Debian Wheezy Any pointers to debug further will be much appreciated. I found similar bugs reported. But none in open state. Thanks, ~jj
_______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users