On 03/30/2011 02:54 PM, Jeremy Stribling wrote:
After restarting a Cassandra 0.7.2 node, the node catches an exception
during initialization and refuses to start:
Caused by: org.apache.cassandra.config.ConfigurationException: Attempt
to assign id to existing column family.
at
org.apache.cassandra.config.CFMetaData.map(CFMetaData.java:222)
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:477)
... 2 more
Unlike a previous thread about this topic
(http://www.mail-archive.com/user@cassandra.apache.org/msg09024.html),
we are not trying to preserve the JVM across restarts. The restart
comes up in an entirely fresh JVM. We are, however, embedding
Cassandra in our application, but we're using the same steps used by
AbstractCassandraDaemon to bring it up.
Looking briefly through the code, the only way I see that this can
happen is if loadSchemas tries to load information about the system
table from storage (because the system table can be created in
CFMetaData from the earlier
DatabaseDescriptor.getTableMetaData(Table.SYSTEM_TABLE).values()
call). Or I guess the data on disk could have multiple entries under
the same key, but the system table issue seems more likely to me.
Unfortunately the logging is not specific enough for me to tell which
key it is failing with, and I haven't been able to reproduce this yet.
One relevant piece of information might be that, before the restart,
our application changed the replication factor of all the tables,
including the system table:
2011-03-29 23:09:39,194 291146 [MigrationStage:1] INFO
org.apache.cassandra.db.migration.Migration - Applying migration
9f371026-5a59-11e0-b23f-65ed1eced995 Update keyspace systemrep
factor:1rep strategy:LocalStrategy{...} to systemrep factor:3rep
strategy:LocalStrategy{...}
We're doing this in order to dynamically change the replication factor
as new nodes are being added to the cluster (e.g., it starts off with
one node and a repfactor of 1, and once there are three nodes, it
increases the repfactor on all tables to 3). Is it possible that
migrations over the system table get written to disk in a way that
would cause loadSchemas() during a restart to hit this exception? Are
we even allowed to change the replication factor of the system table?
I've confirmed that this happens when loading column family "IndexInfo"
from the table "system" during the loadSchemas() call. Does anyone know
if there's a way to get around this? Perhaps, like I theorized, it's
not legit to change the replication factor on the system table.