GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/4223
when closing a socket in response to an IOException, also print the root issue if closing fails `Selector.pollSelectionKeys()` attempts to close the channel in response to an Exception (lets call this exception the root issue). if the root issue itself is an IOException, its printed to log at debug level (which is usually off for production users): ```java catch (Exception e) { String desc = channel.socketDescription(); if (e instanceof IOException) log.debug("Connection with {} disconnected", desc, e); <----- does not appear in real-life log else if (e instanceof AuthenticationException) // will be logged later as error by clients log.debug("Connection with {} disconnected due to authentication exception", desc, e); else log.warn("Unexpected error from {}; closing connection", desc, e); close(channel, true); } ``` for some cases, close itself would throw an exception. this exception is printed to log as a warning (`Selector.doClose()`): ```java try { channel.close(); } catch (IOException e) { log.error("Exception closing connection to node {}:", channel.id(), e); } ``` this tends to actually show up in user log, looking something like this (note - line numbers are from kafka 10.2.*): ``` java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) at org.apache.kafka.common.network.Selector.close(Selector.java:531) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) ``` this issue spams user's logs and is not really helpful in diagnosing the real underlying cause, which (after turning debug logs on) turned out to be (for this particular case): ``` javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1409) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214) at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186) at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469) at org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:382) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:243) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:957) at sun.security.ssl.Handshaker$1.run(Handshaker.java:897) at sun.security.ssl.Handshaker$1.run(Handshaker.java:894) at java.security.AccessController.doPrivileged(Native Method) at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1347) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:336) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:417) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:270) ... 7 more Caused by: sun.security.validator.ValidatorException: No trusted certificate found at sun.security.validator.SimpleValidator.buildTrustedChain(SimpleValidator.java:384) at sun.security.validator.SimpleValidator.engineValidate(SimpleValidator.java:133) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1465) ... 16 more ``` this patch allows submitting the root cause exception (the one caught in pollSelectionKeys) to close so that in the event close encounters an exception the root cause is also printed alongside the close exception. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka better-logging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/4223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4223 ---- commit d05e4fda9171f8cf0b73d175c05dc15ed99deba4 Author: radai-rosenblatt <radai.rosenbl...@gmail.com> Date: 2017-11-15T23:12:42Z when closing a socket in response to an IOException, also print the root issue if closing fails ---- ---