Node down during move

Jiri Horky Fri, 19 Dec 2014 11:21:49 -0800

Hi list,

we added a new node to existing 8-nodes cluster with C* 1.2.9 without
vnodes and because we are almost totally out of space, we are shuffling
the token fone node after another (not in parallel). During one of this
move operations, the receiving node died and thus the streaming failed:


 WARN [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,227
StorageService.java (line 3703) Streaming to /X.Y.Z.18 failed
 INFO [RMI TCP Connection(12940)-X.Y.Z.17] 2014-12-19 19:25:56,233
ColumnFamilyStore.java (line 629) Enqueuing flush of
Memtable-local@433096244(70/70 serialized/live bytes, 2 ops)
 INFO [FlushWriter:3772] 2014-12-19 19:25:56,238 Memtable.java (line
461) Writing Memtable-local@433096244(70/70 serialized/live bytes, 2 ops)
ERROR [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,246
CassandraDaemon.java (line 192) Exception in thread Thread[Streaming to
/X.Y.Z.18:2,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Broken pipe
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

After restart of the receiving node, we tried to perform the move again,
but it failed with:

Exception in thread "main" java.io.IOException: target token
113427455640312821154458202477256070486 is already owned by another node.
        at
org.apache.cassandra.service.StorageService.move(StorageService.java:2930)

So we tried to move it with a token just 1 higher, to trigger the
movement. This didn't move anything, but finished successfully:

 INFO [Thread-5520] 2014-12-19 20:00:24,689 StreamInSession.java (line
199) Finished streaming session 4974f3c0-87b1-11e4-bf1b-97d9ac6bd256
from /X.Y.Z.18

Now, it is quite improbable that the first streaming was done and it
died just after copying everything, as the ERROR was the last message
about streaming in the logs. Is there any way how to make sure the data
are really moved and thus running nodetool cleanup is safe?
   
Thank you.
Jiri Hoky

Node down during move

Reply via email to