Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir <m...@liquidweb.com> wrote:
> Greetings folks, > > I'm faced with the need to update a 36 node cluster with roughly 25T of data > on disk to a version of cassandra in the 1.2.x series. While it seems that > 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling > upgrade, I'd still like to have a roll-back plan in case the rolling upgrade > goes sideways. > > I've tried to upgrade a single node in my dev cluster, then roll back using a > snapshot taken previously, but things don't appear to be going smoothly. The > node will rejoin the ring eventually, but not after spending some time in the > "Joining" state as shown by "nodetool ring", and spewing a ton of error > messages similar to the following: > > ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java > (line 61) Error in row mutation > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178 > > My test procedure is as follows: > 1) nodetool -h localhost snapshot > 2) nodetool -h localhost drain > 3) service cassandra stop > 4) back up cassandra configs > 5) remove cassandra 1.0.9 > 6) install cassandra 1.2.8 > 7) restore cassandra configs, alter them to remove configuration entries no > longer used > 8) start cassandra 1.2.8, let it run for a bit, then drain/stop it > 9) remove cassandra 1.2.8 > 10) reinstall cassandra 1.0.9 > 11) restore original cassandra configs > 12) remove any commit logs present > 13) remove folders for system_auth and system_traces Keyspaces (since they > don't seem to be present in 1.0.9) > 14) Move snapshots back to where they should be for 1.0.9 and remove cass > 1.2.8 data > # cd /var/lib/cassandra/data/$KEYSPACE/ > # mv */snapshots/$TIMESTAMP/* . > # find . -mindepth 1 -type d -exec rm -rf {} \; > # cd /var/lib/cassandra/data/system > # mv */snapshots/$TIMESTAMP/* . > # find . -mindepth 1 -type d -exec rm -rf {} \; > 15) start cassandra 1.0.9 > 16) observe cassandra system.log > > Does anyone have any insight on things I may be doing wrong, or whether this > is just an unavoidable pain point caused by rolling back? It seems that since > there are no schema changes going on, the node should be able to just hop > back into the cluster without error and without transitioning through the > "Joining" state. > > -- > > > > Mike Neir > Liquid Web, Inc. > Infrastructure Administrator >