Hi, All

I 've like to use kafka web console to monitor the offset/topics stuff, it
is easy to use, however, it is freezing/stopping or dying too frequently.
I don't think it's a problem on the OS level.
Seems to be a problem on the application level.
I've already fixed open file handlers to 98000 for anybody and time_waits
to 30s instead of the default 5 minutes.

>From what I can see from the logs, it starts with play:
[ESC[31merrorESC[0m] play - Cannot invoke the action, eventually got an
error: java.lang.RuntimeException: Exception while executing statement : IO
Exception: "java.io.IOException: Too many open files";
"/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031

Caused by: java.lang.RuntimeException: Exception while executing statement
: IO Exception: "java.io.IOException: Too many open files";
"/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031
delete from offsetPoints

then this seems to cause socket connection errors:
Caused by: java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
~[na:1.7.0_75]
at java.io.File.createNewFile(File.java:1006) ~[na:1.7.0_75]
at org.h2.store.fs.FilePathDisk.createTempFile(FilePathDisk.java:367)
~[h2.jar:1.3.172]
at org.h2.store.fs.FileUtils.createTempFile(FileUtils.java:329)
~[h2.jar:1.3.172]
at org.h2.engine.Database.createTempFile(Database.java:1529)
~[h2.jar:1.3.172]
at org.h2.result.RowList.writeAllRows(RowList.java:90) ~[h2.jar:1.3.172]
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
topic-exist-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
topic-rep-3-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
PofApiTest-2
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
fileread
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic
pageview
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic
topic-exist-test from partition leaders 10.100.71.42:9092, 10.100.71.42:9092,
10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092,
10.100.71.42:9092, 10.100.71.42:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic
topic-exist-test

-jar:9092, exemplary-birds:9092, voluminous-mass:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic
PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic
topic-rep-3-test from partition leaders exemplary-birds:9092,
voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092,
voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092,
voluminous-mass:9092
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic
fileread from partition leaders voluminous-mass:9092, harmful-jar:9092,
exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092,
exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader
harmful-jar:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic
PofApiTest-2

Then this leads to time_wait on the monitoring box to the production server:
1 tcp6 0 0 10.100.68.48:35050 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35051 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35055 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35057 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35064 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35065 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35066 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35073 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35074 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35075 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35085 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35088 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35100 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35103 10.100.98.100:9092 TIME_WAIT

But that only lasts for about 30s to 1minute then supervisord seems to
restart webconsole after these time_waits go way or the sockets and files
are properly closed or they get flushed from either play/webconsole or
kafka.

Any ideas?

-- 

Alec Li

Reply via email to