Sorry for reposting the previous message as the images didn’t come thru.. 
pasting as text. 

I have changed both system and user limits. 

To completely isolate the problem, I have tried the application in Centos 7 
environment. Set the ulimit to 1million and system limits to 10million open 
files. Now 3 kafka nodes are running in separate servers each and streaming 
application is running in a dedicated VM. Now the application is not explicitly 
throwing “too many open files” error but automatically aborted with the message 

terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
./bin/start.sh: line 42:  1100 Aborted 

Here is the last few lines from strace output which shows the aborted message. 

25721 14:58:35 
open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/MANIFEST-000007",
 O_RDONLY|O_CLOEXEC) = 12505
25721 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", 
O_RDONLY) = 12506
25721 14:58:35 read(12506, "512\n", 4096) = 4
25721 14:58:35 close(12506)             = 0
25721 14:58:35 write(12502, "s.advise_random_on_open: 0\n2019/"..., 4096) = 4096
25721 14:58:35 write(12502, "ions.comparator: leveldb.Bytewis"..., 4096) = 4096
25721 14:58:35 read(12505, 
"V\371\270\370\34\0\1\1\32leveldb.BytewiseCompara"..., 32768) = 192
25721 14:58:35 read(12505, "", 28672)   = 0
25721 14:58:35 close(12505)             = 0
17701 14:58:35 
open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/000006.sst",
 O_RDONLY|O_CLOEXEC) = 12505
17702 14:58:35 +++ exited with 0 +++
25721 14:58:35 write(2, "terminate called after throwing "..., 48) = 48
25721 14:58:35 write(2, "std::system_error", 17) = 17
25721 14:58:35 write(2, "'\n", 2)       = 2
25721 14:58:35 write(2, "  what():  ", 11) = 11
25721 14:58:35 write(2, "Resource temporarily unavailable", 32) = 32
25721 14:58:35 write(2, "\n", 1)        = 1
17701 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", 
O_RDONLY) = 12506
25721 14:58:35 write(12502, "ction_dynamic_level_bytes: 0\n201"..., 3672) = 3672
25721 14:58:35 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=25697, 
si_uid=1000} ---
17701 14:58:35 read(12506,  <unfinished ...>
17701 14:58:36 +++ killed by SIGABRT +++
17700 14:58:36 +++ killed by SIGABRT +++
17699 14:58:36 +++ killed by SIGABRT +++


As I can see from the open files they are way lower (45201) than the max limit 
and hence can we rule out the “open files” is the root cause? 

I have also noticed there is lot of EAGAIN (count 3334) and associated 
“Resource temporarily unavailable” messages as well



25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 write(50, "\0", 1)       = 1
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 read(17, 
"PK\3\4\n\0\0\10\10\0\343SHNM\n4>Q\2\0\0\265\5\0\0A\0\0\0", 30) = 30
25721 14:49:23 read(17, 
"\245T]O\23A\24=\323\226\326-\253\255\250\365\v\4\265ji\221E\344\305\200M\f\201\304\244j"...,
 593) = 593
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25721 14:49:23 read(49, "\0", 16)       = 1
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 read(54, "\0\1dx", 4)    = 4
25721 14:49:23 read(54, 
"\0\0\0\v\0\0\0\0\0\0\0\1dj\0\0\0\0\1\212\0\201RawLog_Pro"..., 91256) = 91256
25721 14:49:23 read(54, 0x7f7044003b80, 4) = -1 EAGAIN (Resource temporarily 
unavailable)
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25721 14:49:23 write(35, "2019-07-03 14:49:23.258 [rawLP1-"..., 219 <unfinished 
...>
25732 14:49:23 close(59)                = 0
25721 14:49:23 <... write resumed> )    = 219
25721 14:49:23 write(1, "2019-07-03 14:49:23.258 [rawLP1-"..., 219) = 219
25721 14:49:23 write(50, "\0", 1)       = 1
25721 14:49:23 read(49, "\0", 16)       = 1
25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
60
25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25708 14:49:23 close(59)                = 0
25709 14:49:23 read(60, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(60)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25708 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 
59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20

But unable to come to any conclusion from these findings. Any insight is 
appreciated. 

Thanks
Thameem

> On Jul 4, 2019, at 9:18 AM, Thameem Ansari <thame...@gmail.com> wrote:
> 
> <strace2.png>
> 
> 
> But unable to come to any conclusion from these findings. Any insight is 
> appreciated. 
> 
> Thanks
> Thameem

Reply via email to