Hi All,

I'm having some major issues bootstrapping a new node to my cluster.  We
are running 1.2.16, with vnodes enabled.

When a new node starts up (with auto_bootstrap), it selects a host ID and
finds the ring successfully:

INFO 18:42:29,559 JOINING: waiting for ring information

It successfully selects a set of tokens.  Then the weird stuff begins.  I
get this error once, while the node is reading the system keyspace:

ERROR 18:42:32,921 Exception in thread
Thread[InternalResponseStage:1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:421)
at org.apache.cassandra.cql.jdbc.JdbcLong.compose(JdbcLong.java:94)
at org.apache.cassandra.db.marshal.LongType.compose(LongType.java:34)
at org.apache.cassandra
.cql3.UntypedResultSet$Row.getLong(UntypedResultSet.java:138)
at org.apache.cassandra.db.SystemTable.migrateKeyAlias(SystemTable.java:199)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:346)
at org.apache.cassandra
.service.MigrationTask$1.response(MigrationTask.java:66)
at org.apache.cassandra
.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
at org.apache.cassandra
.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


But it doesn't stop the bootstrap process.  The node successfully
handshakes versions, and pauses before bootstrapping:


 INFO 18:42:59,564 JOINING: schema complete, ready to bootstrap
 INFO 18:42:59,565 JOINING: waiting for pending range calculation
 INFO 18:42:59,565 JOINING: calculation complete, ready to bootstrap
 INFO 18:42:59,565 JOINING: getting bootstrap token
 INFO 18:42:59,705 JOINING: sleeping 30000 ms for pending range setup


After 30 seconds, I get a flood of endless
org.apache.cassandra.db.UnknownColumnFamilyException
errors, and all other nodes in the cluster log the following endlessly:

INFO [HANDSHAKE-/x.x.x.x] 2014-05-09 18:44:36,289
OutboundTcpConnection.java (line 418) Handshaking version with /x.x.x.x


I suspect there may be something wrong with my schemas.  Sometimes while
restarting an existing node, the node will fail to restart, with the
following error, again while reading the system keyspace:

ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786
CassandraDaemon.java (line 191) Exception in thread
Thread[InternalResponseStage:5,5,main]
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as
hex bytes
        at org.apache.cassandra
.db.marshal.BytesType.fromString(BytesType.java:69)
        at org.apache.cassandra
.config.ColumnDefinition.fromSchema(ColumnDefinition.java:231)
        at org.apache.cassandra
.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524)
        at org.apache.cassandra
.config.CFMetaData.fromSchema(CFMetaData.java:1456)
        at org.apache.cassandra
.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:306)
        at org.apache.cassandra
.db.DefsTable.mergeColumnFamilies(DefsTable.java:444)
        at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:356)
        at org.apache.cassandra
.service.MigrationTask$1.response(MigrationTask.java:66)
        at org.apache.cassandra
.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47)
        at org.apache.cassandra
.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NumberFormatException: An hex string representing
bytes must have an even length
        at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52)
        at org.apache.cassandra
.db.marshal.BytesType.fromString(BytesType.java:65)
        ... 12 more

I am able to fix this error by clearing out the schema_columns system table
on disk.  After that, a node can boot successfully.

Does anyone have a clue what's going on here?

Thanks!

Reply via email to