Hi

I am evaluating Icinga2 on a large (3k nodes) cluster and I am facing some
issue.

Icinga2 is slowly opening too many files, spawning too many threads and
crashing at the end.

Here is the stack trace I see in the log:
[2015-10-06 00:00:20 +0530] critical/ThreadPool: Exception thrown in event
handler:
Error: boost::thread_resource_error
        (0) libboost_thread.so.1.49.0: void
boost::throw_exception<boost::thread_resource_error>(boost::thread_resource_error
const&) (+0xad) [0x7f7a0124541d]
        (1) libboost_thread.so.1.49.0: boost::thread::start_thread()
(+0xeb) [0x7f7a0123ef8b]
        (2) libbase.so: boost::thread::thread<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >
>(boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >,
boost::disable_if<boost::is_convertible<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >&,
boost::detail::thread_move_t<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > > > >,
boost::thread::dummy*>::type) (+0x49) [0x7f7a00b9e699]
        (3) libbase.so: boost::thread*
boost::thread_group::create_thread<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >
>(boost::_bi::bind_t<void, boost::_mfi::mf0<void, icinga::WorkQueue>,
boost::_bi::list1<boost::_bi::value<icinga::WorkQueue*> > >) (+0x77)
[0x7f7a00b9e737]
        (4) libbase.so: icinga::WorkQueue::Enqueue(boost::function<void ()>
const&, bool) (+0x7a) [0x7f7a00b50c7a]
        (5) libremote.so:
icinga::ApiClient::SendMessage(boost::intrusive_ptr<icinga::Dictionary>
const&) (+0x6b) [0x7f7a001ca1ab]
        (6) libremote.so:
icinga::ApiListener::SendConfigUpdate(boost::intrusive_ptr<icinga::ApiClient>
const&) (+0x19fa) [0x7f7a001dbcca]
        (7) libremote.so:
icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket>
const&, icinga::String const&, icinga::ConnectionRole) (+0x2de)
[0x7f7a001e2e5e]
        (8) libbase.so:
icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&)
(+0x2c1) [0x7f7a00b67a51]
        (9) libboost_thread.so.1.49.0: <unknown function> (+0x10629)
[0x7f7a0123f629]
        (10) libpthread.so.0: <unknown function> (+0x6b50) [0x7f7a00e1eb50]
        (11) libc.so.6: clone (+0x6d) [0x7f79fe68c95d]
[2015-10-06 00:00:20 +0530] critical/ApiListener: Cannot accept new
connection.
[2015-10-06 00:00:20 +0530] critical/Socket: accept() failed with error
code 24, "Too many open files"

At that time, there were 16.4k open files out of which 8.6k are TCP
connections and 7.8k are pipes. There were 16.3k threads at that time.
Out of the 8.6k tcp connections, 2.4k were in CLOSE_WAIT and remaining were
in ESTABLISHED state. There were upto 5 established connections to a single
 client.

The open file limit of the process is 16384 (even though I have set a much
larger global limit).

The main process doesn't crash. But most of the checks are in Unknown state
with error:

Error: Function call 'pipe2' failed with error code 24, 'Too many open files'


Version:

icinga2 2.3.10-1~debmon70+1

OS: Debian Wheezy

Any pointers to debug further will be much appreciated. I found
similar bugs reported. But none in open state.


Thanks,

~jj
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to