TO Oliver :   Maybe repair should be executed after all data in MEMTBL are all 
flushed into harddisk?





Sincerely yours,

Georgelin

www_8ems_...@sina.com

mobile:0086 180 5986 1565




----- 原始邮件 -----
发件人:Oliver Herrmann <o.herrmann...@gmail.com>
收件人:user@cassandra.apache.org
主题:repair failed
日期:2019年12月28日 23点15分

Hello,

today the second time our weekly repair job failed which was working for many 
month without a problem. We are having multiple Cassandra nodes in two data 
center. 

The repair command is started only on one node with the following parameters:

nodetool repair -full -dcpar 

Is it problematic if the repair is started only on one node? 
The repair fails after one hour with the following error message:

 failed with error Could not create snapshot at /192.168.13.232 (progress: 0%)
[2019-12-28 05:00:04,295] Some repair failed
[2019-12-28 05:00:04,296] Repair command #1 finished in 1 hour 0 minutes 2 
seconds
error: Repair job has failed with the error message: [2019-12-28 05:00:04,295] 
Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2019-12-28 05:00:04,295] Some repair failed
        at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:116)
        at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(Unknown
 Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(Unknown 
Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(Unknown 
Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(Unknown 
Source)

In the logfile on 192.168.13.232 which is in the second data center I could 
find only in debug.log the following log messages:DEBUG [COMMIT-LOG-ALLOCATOR] 
2019-12-28 04:21:20,143 AbstractCommitLogSegmentManager.java:109 - No segments 
in reserve; creating a fresh one
DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28 04:31:00,450 
OutboundTcpConnection.java:410 - Socket to 192.168.13.120
 closed
DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28 04:31:00,450 
OutboundTcpConnection.java:349 - Error writing to 192.168
.13.120
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_111]

We tried to run repair a few more times but it always failed with the same 
error. After restarting all nodes it was finally successful.

Any idea what could be wrong?
RegardsOliver

Reply via email to