Done and done. I'm really loving how easy the nuclear option has been (it was what I tested first).
will On Tue, Apr 26, 2011 at 5:09 PM, aaron morton <aa...@thelastpickle.com>wrote: > In 0.7.X the cli waits for the schema to agree before returning, you should > see... > > Waiting for schema agreement... > ... schemas agree across the cluster > > Or if things fail > The schema has not settled in %d seconds; further migrations are > ill-advised until it does.%nVersions are %s%n > > WRT the error, first guess is something in the schema has changed it's > upsetting the log replay. Given all the crazy i'd go with the nuclear > option. > > Aaron > > On 27 Apr 2011, at 07:11, William Oberman wrote: > > > In my test cluster I manged to jam up a cassandra server. I figure the > easy & failsafe solution is to just boot a replacement node, but I thought > I'd try a minute to either figure out what I did, or try to figure out how > to properly recover it before I lose my current state. > > > > The symptom = on startup I get an exception: > > ERROR 11:58:34,567 Exception encountered during startup. > > java.lang.IndexOutOfBoundsException: 6 > > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121) > > at > org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) > > at > org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) > > at > org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) > > at > java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:606) > > at > java.util.concurrent.ConcurrentSkipListMap.findPredecessor(ConcurrentSkipListMap.java:685) > > at > java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:864) > > at > java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) > > at > org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:216) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:130) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) > > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:253) > > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:156) > > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:173) > > at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) > > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) > > > > Where things went wrong = I had been doing various testing and unit > testing, as this is my "proof of concept" cluster. The unit tests in > particular work by cloning a keyspace as "keyspace_UUID" (to get a blank > slate). Because of various bugs in my code and configuration, this left a > fair amount of crud keyspaces by the time I got everything to pass. So, I > wrote a script to drop all of the test keyspaces (the script had worked on a > single node environment, which was my first step before the cluster). I > think the CLI doesn't wait for schema propagation, so the script confused > the node I was talking to, as after it ran the schema UUIDs of that node vs. > the rest of the cluster didn't agree ("describe cluster" in the CLI). And, > it wasn't fixing itself. "nodetool loadbalance" said it would do a > decommission/bootstrap, which I thought might give the bad node a kick in > the pants, so I tried it. Afterwards, I ran "nodetool ring" against all > nodes and the problem node claimed all was "UP", but everything else listed > the problem node as "?" and everything else as UP (sadly, I either didn't > check or can't remember what "nodetool ring" said before loadbalance). So, > I shut down the problem node. But, when I tried to restart it, I got the > error you see above. > > > > Not sure what was the worst/dumbest thing I did, but it's definitely > unhappy now! > > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com