Sorry, I didn't see the test procedure, it's still early.

On Aug 30, 2013, at 8:57 AM, Mike Neir <m...@liquidweb.com> wrote:

> Greetings folks,
> 
> I'm faced with the need to update a 36 node cluster with roughly 25T of data 
> on disk to a version of cassandra in the 1.2.x series. While it seems that 
> 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
> upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
> goes sideways.
> 
> I've tried to upgrade a single node in my dev cluster, then roll back using a 
> snapshot taken previously, but things don't appear to be going smoothly. The 
> node will rejoin the ring eventually, but not after spending some time in the 
> "Joining" state as shown by "nodetool ring", and spewing a ton of error 
> messages similar to the following:
> 
> ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
> (line 61) Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
> 
> My test procedure is as follows:
> 1)  nodetool -h localhost snapshot
> 2)  nodetool -h localhost drain
> 3)  service cassandra stop
> 4)  back up cassandra configs
> 5)  remove cassandra 1.0.9
> 6)  install cassandra 1.2.8
> 7)  restore cassandra configs, alter them to remove configuration entries no 
> longer used
> 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
> 9)  remove cassandra 1.2.8
> 10) reinstall cassandra 1.0.9
> 11) restore original cassandra configs
> 12) remove any commit logs present
> 13) remove folders for system_auth and system_traces Keyspaces (since they 
> don't seem to be present in 1.0.9)
> 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
> 1.2.8 data
>  # cd /var/lib/cassandra/data/$KEYSPACE/
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
>  # cd /var/lib/cassandra/data/system
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
> 15) start cassandra 1.0.9
> 16) observe cassandra system.log
> 
> Does anyone have any insight on things I may be doing wrong, or whether this 
> is just an unavoidable pain point caused by rolling back? It seems that since 
> there are no schema changes going on, the node should be able to just hop 
> back into the cluster without error and without transitioning through the 
> "Joining" state.
> 
> -- 
> 
> 
> 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
> 

Reply via email to