On Sat, Jul 9, 2011 at 4:47 PM, aaron morton <aa...@thelastpickle.com>wrote:
> Check the log on all the machines for ERROR messages. An error on any of > the nodes could have caused the streaming to hang. nodetool netstats will > let you know if there is a failed stream. > > Here's what I see in the logs on the node I'm streaming from: INFO 21:31:48,741 Streaming to /x.x.x.2 WARN 21:34:04,064 MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead WARN 21:34:15,716 MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead Here's what I see in the logs on the node that's being streamed to: INFO 21:34:15,000 Enqueuing flush of Memtable-MyCF1@409163361(15568/194600 serialized/live bytes, 4 ops) INFO 21:34:15,062 Enqueuing flush of Memtable-MyCF2@469885942(23/287 serialized/live bytes, 2 ops) INFO 21:34:15,062 Writing Memtable-MyCF1@409163361(15568/194600 serialized/live bytes, 4 ops) INFO 21:35:05,063 Enqueuing flush of Memtable-MyCF3@97707952(494145/6176812 serialized/live bytes, 494 ops) ERROR 21:36:58,886 Fatal exception in thread Thread[Thread-118,5,main] java.lang.RuntimeException: Cannot recover SSTable with version f (current version g). at org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240) at org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1092) at org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110) at org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:162) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:95) Also, I ran the following on the node I'm decommissioning: $ nodetool -h localhost netstats x.x.x.2 Mode: Leaving: streaming data to other nodes Streaming to: /x.x.x.2 /var/lib/cassandra/data/MyKS/MyCF3-f-5100-Data.db sections=1 progress=26253960206/26253960206 - 100% /var/lib/cassandra/data/MyKS/MyCF3-g-23646-Data.db sections=1 progress=0/642731999 - 0% /var/lib/cassandra/data/MyKS/MyCF3-g-8614-Data.db sections=1 progress=0/11024712282 - 0% [...] This is where it hangs. AFAIK if you restart the cass service on 1 it will forget it was leaving and > rejoin in a normal state. > > I've tried restarting, and the node does rejoin the ring, but I get the same result when I try to decommission again. Thanks, Casey