I generally see these exceptions when the cluster is overloaded. I think what's happening is that when the app/driver sends a read request, the coordinator takes a long time to respond because the nodes are busy serving other requests. The driver gives up (client-side timeout reached) and the socket is closed. Meanwhile, the coordinator eventually gets results from replicas and tries to send the response back to the app/driver but can't because the connection is no longer there. Does this scenario sound plausible for your cluster?
Erick Ramirez | Developer Relations erick.rami...@datastax.com | datastax.com <http://www.datastax.com> <https://www.linkedin.com/company/datastax> <https://www.facebook.com/datastax> <https://twitter.com/datastax> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/> <https://www.datastax.com/accelerate> On Wed, 12 Feb 2020 at 21:13, Hanauer, Arnulf, Vodacom South Africa (External) <arnulf.hana...@vcontractor.co.za> wrote: > Hi Cassandra folks, > > > > We are getting a lot of these errors and transactions are timing out and I > was wondering if this can be caused by Cassandra itself or if this is a > genuine Linux network issue only. The client job reports Cassandra node > down after this occurs but I suspect this is due to the connection failure > – need some clarification as where to go look for a solution. > > > > > > *INFO [epollEventLoopGroup-2-10] 2020-02-12 11:53:42,748 Message.java:623 > - Unexpected exception during request; channel = [id: 0x8a3e6831, > L:/10.132.65.152:9042 <http://10.132.65.152:9042> - R:/10.132.11.15:48020 > <http://10.132.11.15:48020>]* > > *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() > failed: Connection reset by peer* > > * at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]* > > > > *INFO [epollEventLoopGroup-2-15] 2020-02-12 11:42:46,871 Message.java:623 > - Unexpected exception during request; channel = [id: 0xa071f1c8, > L:/10.132.65.152:9042 <http://10.132.65.152:9042> - R:/10.132.11.15:45134 > <http://10.132.11.15:45134>]* > > *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() > failed: Connection reset by peer* > > * at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]* > > > > > > Source and Destination IP addresses are in the same DC (LAN). > > > > I did recycle all the Cassandra services on all the nodes in both clusters > but the problem remains. > > > > The only change made recently was the adding of replicas in the second DC > for the keyspace that is being written to when these messages occur (not > had a chance to run a full repair yet to sync the replicas) > > > > > > FYI: > > Cassandra 3.11.2 > > 5 Node cluster each in 2 DC’s > > > > > > Kind regards > Arnulf Hanauer > > > > > > > > > > > "This e-mail is sent on the Terms and Conditions that can be accessed by > Clicking on this link https://webmail.vodacom.co.za/tc/default.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.vodacom.co.za_vodacom_terms_email-2Dacceptable-2Duser-2Dpolicy&d=DwMFAg&c=adz96Xi0w1RHqtPMowiL2g&r=DPfYm4e7OLSdVEGyWr82F_m1fTjoAHtX5mdHEINlrQw&m=Cz0CXUbGNM5oF7LQwJE1Z3tCQtOsH_Oerb8gVDKOshU&s=LutuQpxi284UPHm0bQsqVMlLobQnBwQQ694tK8g1Reo&e=> > " >