On the command (4 node cluster): nodetool gossipinfo -h localhost |grep SCHEMA |sort | uniq -c | sort -n 4 SCHEMA:60edeaa8-70a4-3825-90a5-d7746ffa8e4d
On the second part, I have the same Cassandra version in staging and production, with staging being a smaller cluster. Not sure what you mean by nuking schema's (ie. delete directories ?) Jim From: Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>> Reply-To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tue, 9 Jul 2013 11:35:35 -0700 To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: alter column family ? On Tue, Jul 9, 2013 at 10:26 AM, Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote: nodetool -h localhost netstats |grep SCHEMA |sort | uniq -c | sort -n Sorry, I meant "gossipinfo" and not "netstats". With the right command, do you see that all nodes in the cluster have the same schema version? I'm on version 1.1.2 1) Hinted Handoff is broken in 1.1.2, upgrade ASAP. 2) I believe the particular case you are encountering may be a more specific bug from the 1.1.2 timeframe. 3) Desynched schema like you seem to be encountering is very common in 1.1.2 timeframe. In most cases the best/only solution is : a) drain all nodes, and stop them b) nuke schema on all nodes (optionally nuke/move aside entire system keyspace) c) start nodes, waiting for cluster to completely coalesce d) re-load schema one statement at a time, BEING SURE TO WAIT FOR SCHEMA AGREEMENT ON ***ALL NODES*** before running the next schema altering statement If you are unable to take an outage on this cluster, there are other ways to resolve issues like this but generally they will both be complex and error prone and will take much more time and effort than doing the above. =Rob