On the command (4 node cluster):

nodetool gossipinfo -h localhost |grep SCHEMA |sort | uniq -c | sort -n
      4   SCHEMA:60edeaa8-70a4-3825-90a5-d7746ffa8e4d

On the second part, I have the same Cassandra version in staging and
production, with staging being a smaller cluster. Not sure what you mean
by nuking schema's (ie. delete directories ?)

Jim

From: Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>>
Reply-To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tue, 9 Jul 2013 11:35:35 -0700
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: alter column family ?

On Tue, Jul 9, 2013 at 10:26 AM, Robert Coli 
<rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote:
nodetool -h localhost netstats |grep SCHEMA |sort | uniq -c | sort -n

Sorry, I meant "gossipinfo" and not "netstats".

With the right command, do you see that all nodes in the cluster have the same 
schema version?

I'm on version 1.1.2

1) Hinted Handoff is broken in 1.1.2, upgrade ASAP.
2) I believe the particular case you are encountering may be a more specific 
bug from the 1.1.2 timeframe.
3) Desynched schema like you seem to be encountering is very common in 1.1.2 
timeframe. In most cases the best/only solution is :
   a) drain all nodes, and stop them
   b) nuke schema on all nodes (optionally nuke/move aside entire system 
keyspace)
   c) start nodes, waiting for cluster to completely coalesce
   d) re-load schema one statement at a time, BEING SURE TO WAIT FOR SCHEMA 
AGREEMENT ON ***ALL NODES*** before running the next schema altering statement

If you are unable to take an outage on this cluster, there are other ways to 
resolve issues like this but generally they will both be complex and error 
prone and will take much more time and effort than doing the above.

=Rob

Reply via email to