Hi all, Since upgrading from 1.2.13 to 2.0.10 last week, we’ve been having trouble with repairs. nodetool repair keeps returning "Lost notification. You should check server log for repair status of keyspace.” It usually happens in the middle of the repair. Yet, the logs on the node show the repair continuing and sometimes completing.
The times where the repair doesn’t finish, it looks like the repair snapshot is removed from one of the nodes running the validation. I see errors like "Validation failed in [ip address]” on the node being repaired and a matching error on the validating node: ERROR [ValidationExecutor:6] 2014-12-13 10:48:43,390 CassandraDaemon.java (line 199) Exception in thread Thread[ValidationExecutor:6,1,main] java.lang.RuntimeException: java.io.FileNotFoundException: /raid0/cassandra/data/[ks]/[cf]/snapshots/4d888660-82b2-11e4-ad65-db08cc65545a/[ks]-[cf]-jb-51149-Data.db (No such file or directory) I see that there were some repair issues fixed in 2.0.3, but nothing since then. Has anyone else hit this? Thanks! -Allan