The entry in the debug.log is not specific to a repair session, and it could also be caused by reasons other than network connectivity issue, such as long STW GC pauses. I usually don't start troubleshooting an issue from the debug log, as it can be rather noisy. The system.log is a better starting point.

If I was to troubleshoot the issue, I would start from the system logs on the node that initiated the repair, i.e. the node you ran the "nodetool repair" command on. Follow the repair ID (an UUID) in the logs on all nodes involved in the repair and read all related logs in chronological order to find out what exactly had happened.

BTW, If the issue is easily reproducible, I would re-run the repair with a reduce scope (such as table and token range) to get less logs related to the repair session. Less logs means less time spend on reading and analysing them.

Hope this helps.

On 18/01/2022 10:03, manish khandelwal wrote:
I have a Cassandra 3.11.2 cluster with two DCs. While running repair , I am observing the following behavior.

I am seeing that node is not able to receive merkle tree from one or two nodes. Also I am able to see that the missing nodes did send the merkle tree but it was not received. This make repair hangs on consistent basis. In netstats I can see output as follows

*Mode: NORMAL*
*Not sending any streams. Attempted: 7858888*
*Mismatch (Blocking): 2560*
*Mismatch (Background): 17173*
*Pool Name Active Pending Completed Dropped*
*Large messages n/a 0 6313 3*
*Small messages n/a 0 55978004 3*
*Gossip messages n/a 0 93756 125**Does it represent network issues? In Debug logs I saw something*DEBUG [MessagingService-Outgoing-hostname/xxx.yy.zz.kk-Large] 2022-01-14 05:00:19,031 OutboundTcpConnection.java:349 - Error writing to hostname/xxx.yy.zz.kk
java.io.IOException: Connection timed out
at sun.nio.ch <http://sun.nio.ch/>.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_221] at sun.nio.ch <http://sun.nio.ch/>.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_221] at sun.nio.ch <http://sun.nio.ch/>.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_221] at sun.nio.ch <http://sun.nio.ch/>.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_221] at sun.nio.ch <http://sun.nio.ch/>.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_221] at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[na:1.8.0_221]
at java.nio.channels.Channels.writeFully(Channels.java:98) ~[na:1.8.0_221]
at java.nio.channels.Channels.access$000(Channels.java:61) ~[na:1.8.0_221]
at java.nio.channels.Channels$1.write(Channels.java:174) ~[na:1.8.0_221]
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205) ~[lz4-1.3.0.jar:na] at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158) ~[lz4-1.3.0.jar:na] (edited)

Does this show any network fluctuations?

Regards
Manish

Reply via email to