回复：repair failed

gloCalHelp.com Sun, 29 Dec 2019 00:55:18 -0800

TO Oliver ：   Maybe repair should be executed after all data in MEMTBL are all 
flushed into harddisk?






Sincerely yours,

Georgelin

www_8ems_...@sina.com

mobile:0086 180 5986 1565




----- 原始邮件 -----
发件人：Oliver Herrmann <o.herrmann...@gmail.com>
收件人：user@cassandra.apache.org
主题：repair failed
日期：2019年12月28日 23点15分

Hello,

today the second time our weekly repair job failed which was working for many 
month without a problem. We are having multiple Cassandra nodes in two data 
center. 

The repair command is started only on one node with the following parameters:

nodetool repair -full -dcpar 

Is it problematic if the repair is started only on one node? 
The repair fails after one hour with the following error message:

 failed with error Could not create snapshot at /192.168.13.232 (progress: 0%)
[2019-12-28 05:00:04,295] Some repair failed
[2019-12-28 05:00:04,296] Repair command #1 finished in 1 hour 0 minutes 2 
seconds
error: Repair job has failed with the error message: [2019-12-28 05:00:04,295] 
Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2019-12-28 05:00:04,295] Some repair failed
        at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:116)
        at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(Unknown
 Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(Unknown 
Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(Unknown 
Source)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(Unknown 
Source)

In the logfile on 192.168.13.232 which is in the second data center I could 
find only in debug.log the following log messages:DEBUG [COMMIT-LOG-ALLOCATOR] 
2019-12-28 04:21:20,143 AbstractCommitLogSegmentManager.java:109 - No segments 
in reserve; creating a fresh one
DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28 04:31:00,450 
OutboundTcpConnection.java:410 - Socket to 192.168.13.120
 closed
DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28 04:31:00,450 
OutboundTcpConnection.java:349 - Error writing to 192.168
.13.120
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_111]

We tried to run repair a few more times but it always failed with the same 
error. After restarting all nodes it was finally successful.

Any idea what could be wrong?
RegardsOliver

回复：repair failed

Reply via email to