After restarting a Cassandra 0.7.2 node, the node catches an exception
during initialization and refuses to start:
Caused by: org.apache.cassandra.config.ConfigurationException: Attempt
to assign id to existing column family.
at org.apache.cassandra.config.CFMetaData.map(CFMetaData.java:222)
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:477)
... 2 more
Unlike a previous thread about this topic
(http://www.mail-archive.com/user@cassandra.apache.org/msg09024.html),
we are not trying to preserve the JVM across restarts. The restart
comes up in an entirely fresh JVM. We are, however, embedding Cassandra
in our application, but we're using the same steps used by
AbstractCassandraDaemon to bring it up.
Looking briefly through the code, the only way I see that this can
happen is if loadSchemas tries to load information about the system
table from storage (because the system table can be created in
CFMetaData from the earlier
DatabaseDescriptor.getTableMetaData(Table.SYSTEM_TABLE).values() call).
Or I guess the data on disk could have multiple entries under the same
key, but the system table issue seems more likely to me. Unfortunately
the logging is not specific enough for me to tell which key it is
failing with, and I haven't been able to reproduce this yet.
One relevant piece of information might be that, before the restart, our
application changed the replication factor of all the tables, including
the system table:
2011-03-29 23:09:39,194 291146 [MigrationStage:1] INFO
org.apache.cassandra.db.migration.Migration - Applying migration
9f371026-5a59-11e0-b23f-65ed1eced995 Update keyspace systemrep
factor:1rep strategy:LocalStrategy{...} to systemrep factor:3rep
strategy:LocalStrategy{...}
We're doing this in order to dynamically change the replication factor
as new nodes are being added to the cluster (e.g., it starts off with
one node and a repfactor of 1, and once there are three nodes, it
increases the repfactor on all tables to 3). Is it possible that
migrations over the system table get written to disk in a way that would
cause loadSchemas() during a restart to hit this exception? Are we even
allowed to change the replication factor of the system table?
Thanks,
Jeremy