Sorry for reposting the previous message as the images didn’t come thru.. pasting as text.
I have changed both system and user limits. To completely isolate the problem, I have tried the application in Centos 7 environment. Set the ulimit to 1million and system limits to 10million open files. Now 3 kafka nodes are running in separate servers each and streaming application is running in a dedicated VM. Now the application is not explicitly throwing “too many open files” error but automatically aborted with the message terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable ./bin/start.sh: line 42: 1100 Aborted Here is the last few lines from strace output which shows the aborted message. 25721 14:58:35 open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/MANIFEST-000007", O_RDONLY|O_CLOEXEC) = 12505 25721 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", O_RDONLY) = 12506 25721 14:58:35 read(12506, "512\n", 4096) = 4 25721 14:58:35 close(12506) = 0 25721 14:58:35 write(12502, "s.advise_random_on_open: 0\n2019/"..., 4096) = 4096 25721 14:58:35 write(12502, "ions.comparator: leveldb.Bytewis"..., 4096) = 4096 25721 14:58:35 read(12505, "V\371\270\370\34\0\1\1\32leveldb.BytewiseCompara"..., 32768) = 192 25721 14:58:35 read(12505, "", 28672) = 0 25721 14:58:35 close(12505) = 0 17701 14:58:35 open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/000006.sst", O_RDONLY|O_CLOEXEC) = 12505 17702 14:58:35 +++ exited with 0 +++ 25721 14:58:35 write(2, "terminate called after throwing "..., 48) = 48 25721 14:58:35 write(2, "std::system_error", 17) = 17 25721 14:58:35 write(2, "'\n", 2) = 2 25721 14:58:35 write(2, " what(): ", 11) = 11 25721 14:58:35 write(2, "Resource temporarily unavailable", 32) = 32 25721 14:58:35 write(2, "\n", 1) = 1 17701 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", O_RDONLY) = 12506 25721 14:58:35 write(12502, "ction_dynamic_level_bytes: 0\n201"..., 3672) = 3672 25721 14:58:35 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=25697, si_uid=1000} --- 17701 14:58:35 read(12506, <unfinished ...> 17701 14:58:36 +++ killed by SIGABRT +++ 17700 14:58:36 +++ killed by SIGABRT +++ 17699 14:58:36 +++ killed by SIGABRT +++ As I can see from the open files they are way lower (45201) than the max limit and hence can we rule out the “open files” is the root cause? I have also noticed there is lot of EAGAIN (count 3334) and associated “Resource temporarily unavailable” messages as well 25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25732 14:49:23 close(59) = 0 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25721 14:49:23 write(50, "\0", 1) = 1 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25721 14:49:23 read(17, "PK\3\4\n\0\0\10\10\0\343SHNM\n4>Q\2\0\0\265\5\0\0A\0\0\0", 30) = 30 25721 14:49:23 read(17, "\245T]O\23A\24=\323\226\326-\253\255\250\365\v\4\265ji\221E\344\305\200M\f\201\304\244j"..., 593) = 593 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25732 14:49:23 close(59) = 0 25721 14:49:23 read(49, "\0", 16) = 1 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25732 14:49:23 close(59) = 0 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25721 14:49:23 read(54, "\0\1dx", 4) = 4 25721 14:49:23 read(54, "\0\0\0\v\0\0\0\0\0\0\0\1dj\0\0\0\0\1\212\0\201RawLog_Pro"..., 91256) = 91256 25721 14:49:23 read(54, 0x7f7044003b80, 4) = -1 EAGAIN (Resource temporarily unavailable) 25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25721 14:49:23 write(35, "2019-07-03 14:49:23.258 [rawLP1-"..., 219 <unfinished ...> 25732 14:49:23 close(59) = 0 25721 14:49:23 <... write resumed> ) = 219 25721 14:49:23 write(1, "2019-07-03 14:49:23.258 [rawLP1-"..., 219) = 219 25721 14:49:23 write(50, "\0", 1) = 1 25721 14:49:23 read(49, "\0", 16) = 1 25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 60 25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25708 14:49:23 close(59) = 0 25709 14:49:23 read(60, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(60) = 0 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25708 14:49:23 close(59) = 0 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25709 14:49:23 close(59) = 0 25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 25732 14:49:23 close(59) = 0 25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59 25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20 But unable to come to any conclusion from these findings. Any insight is appreciated. Thanks Thameem > On Jul 4, 2019, at 9:18 AM, Thameem Ansari <thame...@gmail.com> wrote: > > <strace2.png> > > > But unable to come to any conclusion from these findings. Any insight is > appreciated. > > Thanks > Thameem