Schema collision results in multiple data directories per table

Tom Offermann Fri, 01 Oct 2021 09:44:05 -0700

When adding a datacenter to a keyspace (following the Last Pickle [Data
Center Switch][lp] playbook), I ran into a "Configuration exception merging
remote schema" error. The nodes in one datacenter didn't converge to the
new schema version, and after restarting them, I saw the symptoms described
in this Datastax article on [Fixing a table schema collision][ds], where
there were two data directories for each table in the keyspace on the nodes
that didn't converge. I followed the recovery steps in the Datastax article
to move the data from the older directories to the new directories, ran
`nodetool refresh`, and that fixed the problem.


[lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
[ds]:
https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html

While the Datastax article was super helpful for helping me recover, I'm
left wondering *why* this happened. If anyone can shed some light on that,
or offer advice on how I can avoid getting in this situation in the future,
I would be most appreciative. I'll describe the steps I took in more detail
in the thread.

## Steps

1. The day before, I had added the second datacenter ('dc2') to the
system_traces, system_distributed, and system_auth keyspaces and ran
`nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly
with no issues.

2. For a large keyspace, I added the second datacenter ('dc2') with an
`ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy',
'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the
log:
    ```
    "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
    "org.apache.cassandra.exceptions.ConfigurationException: Column family
ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
    "\tat
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_232]"
    "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_232]"
    "\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_232]"
    "\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_232]"
    "\tat
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
    ```

    I also saw this:
    ```
    "ERROR 16:46:48 Configuration exception merging remote schema"
    "org.apache.cassandra.exceptions.ConfigurationException: Column family
ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
    "\tat
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
~[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_232]"
    "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_232]"
    "\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_232]"
    "\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_232]"
    "\tat
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
[apache-cassandra-3.11.5.jar:3.11.5]"
    "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
    ```
    This error repeated several times over the next 2 minutes.

3. While running `nodetool describecluster` repeatedly, I saw that the
nodes in the 'dc2' datacenter converged to the new schema version quickly,
but the nodes in the original datacenter ('dc1') remained at the previous
schema version.

4. I waited to see if all of the nodes would converge to the new schema
version, but they still hadn't converged after roughly 10 minutes. Given
the errors I saw, I wasn't optimistic it would work out all by itself, so I
decided to restart the nodes in the 'dc1' datacenter one at a time so they
would restart with the latest schema version.

5. After each node restarted, `nodetool describecluster` showed it as being
on the latest schema version. So, after getting through all the 'dc1'
nodes, it looked like everything in the cluster was healthy again.

6. However, that's when I noticed that there were two data directories on
disk for each table in the keyspace. New writes for a table were being
saved in the newer directory, but queries for data saved in the old data
directory were returning no results.

7. That's when I followed the recovery steps in the Datastax article with
great success.

## Questions

* My understanding is that running concurrent schema updates should always
be avoided, since that can result in schema collisions. But, in this case,
I wasn't performing multiple schema updates. I was just running a single
`ALTER KEYSPACE` statement. Any idea why a single schema update would
result in a schema collision and two data directories per table?

* Should I have waited longer before restarting nodes? Perhaps, given
enough time, the Cassandra nodes would have all converged on the correct
schema version, and this would have resolved on it's own?

* Any suggestions for how I can avoid this problem in the future?

-- 
Tom Offermann
Lead Software Engineer
http://newrelic.com

Schema collision results in multiple data directories per table

Reply via email to