I generally see these exceptions when the cluster is overloaded. I think
what's happening is that when the app/driver sends a read request, the
coordinator takes a long time to respond because the nodes are busy serving
other requests. The driver gives up (client-side timeout reached) and the
socket is closed. Meanwhile, the coordinator eventually gets results from
replicas and tries to send the response back to the app/driver but can't
because the connection is no longer there. Does this scenario sound
plausible for your cluster?

Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com <http://www.datastax.com>
<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<https://www.datastax.com/accelerate>



On Wed, 12 Feb 2020 at 21:13, Hanauer, Arnulf, Vodacom South Africa
(External) <arnulf.hana...@vcontractor.co.za> wrote:

> Hi Cassandra folks,
>
>
>
> We are getting a lot of these errors and transactions are timing out and I
> was wondering if this can be caused by Cassandra itself or if this is a
> genuine Linux network issue only. The client job reports Cassandra node
> down after this occurs but I suspect this is due to the connection failure
> – need some clarification as where to go look for a solution.
>
>
>
>
>
> *INFO  [epollEventLoopGroup-2-10] 2020-02-12 11:53:42,748 Message.java:623
> - Unexpected exception during request; channel = [id: 0x8a3e6831,
> L:/10.132.65.152:9042 <http://10.132.65.152:9042> - R:/10.132.11.15:48020
> <http://10.132.11.15:48020>]*
>
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
>
> *        at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]*
>
>
>
> *INFO  [epollEventLoopGroup-2-15] 2020-02-12 11:42:46,871 Message.java:623
> - Unexpected exception during request; channel = [id: 0xa071f1c8,
> L:/10.132.65.152:9042 <http://10.132.65.152:9042> - R:/10.132.11.15:45134
> <http://10.132.11.15:45134>]*
>
> *io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer*
>
> *        at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]*
>
>
>
>
>
> Source and Destination IP addresses are in the same DC (LAN).
>
>
>
> I did recycle all the Cassandra services on all the nodes in both clusters
> but the problem remains.
>
>
>
> The only change made recently was the adding of replicas in the second DC
> for the keyspace that is being written to when these messages occur (not
> had a chance to run a full repair yet to sync the replicas)
>
>
>
>
>
> FYI:
>
> Cassandra 3.11.2
>
> 5 Node cluster each in 2 DC’s
>
>
>
>
>
> Kind regards
> Arnulf Hanauer
>
>
>
>
>
>
>
>
>
>
> "This e-mail is sent on the Terms and Conditions that can be accessed by
> Clicking on this link https://webmail.vodacom.co.za/tc/default.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.vodacom.co.za_vodacom_terms_email-2Dacceptable-2Duser-2Dpolicy&d=DwMFAg&c=adz96Xi0w1RHqtPMowiL2g&r=DPfYm4e7OLSdVEGyWr82F_m1fTjoAHtX5mdHEINlrQw&m=Cz0CXUbGNM5oF7LQwJE1Z3tCQtOsH_Oerb8gVDKOshU&s=LutuQpxi284UPHm0bQsqVMlLobQnBwQQ694tK8g1Reo&e=>
> "
>

Reply via email to