We need to do a rolling upgrade of our Cassandra cluster in production, since 
we are upgrading Cassandra on solaris to Cassandra on CentOS.
(We went with solaris initially since most of our other hosts in production are 
solaris, but were running into some lockup issues during perf tests, and 
decided to switch to linux)

Here are the steps we are following to take the node out of service, and get it 
back. Can someone comment if we are missing anything (eg. is it recommended to 
specify tokens in cassandra.yaml, or do something different with the seed hosts 
than mentioned below)

1.       nodetool decommission - wait for the data to be streamed out.

2.       Re-image (everything is wiped off the disks) the host to CentOS, with 
the same Cassandra version

3.       Get Cassandra back up.

Other details:

-          Using Cassandra 1.1.5

-          We do not specify any tokens in cassandra.yaml relying on bootstrap 
assigning the tokens automatically.

-          We are testing with a 4 node cluster, with only one seed host. The 
seed host is specified in the cassandra.yaml of each node and is not changed at 
any point.

While testing the solaris to linux upgrade path, things seem to work smoothly. 
The data streams out fine, and streams back in when the node comes back up. 
However, testing the linux to solaris path (in case we need to rollback), we 
are facing some issues with the nodes joining back the ring. nodetool indicates 
that the node has joined back the ring, but no data streams in, the node 
doesn't know about the keyspaces/column families, etc. We see some errors in 
the logs of the newly added nodes pasted below.

[17/06/2013:14:10:17 PDT] MutationStage:1: ERROR RowMutationVerbHandler.java 
(line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
        at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
        at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Thanks,
Arindam

Reply via email to