When adding a datacenter to a keyspace (following the Last Pickle [Data Center Switch][lp] playbook), I ran into a "Configuration exception merging remote schema" error. The nodes in one datacenter didn't converge to the new schema version, and after restarting them, I saw the symptoms described in this Datastax article on [Fixing a table schema collision][ds], where there were two data directories for each table in the keyspace on the nodes that didn't converge. I followed the recovery steps in the Datastax article to move the data from the older directories to the new directories, ran `nodetool refresh`, and that fixed the problem.
[lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html [ds]: https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html While the Datastax article was super helpful for helping me recover, I'm left wondering *why* this happened. If anyone can shed some light on that, or offer advice on how I can avoid getting in this situation in the future, I would be most appreciative. I'll describe the steps I took in more detail in the thread. ## Steps 1. The day before, I had added the second datacenter ('dc2') to the system_traces, system_distributed, and system_auth keyspaces and ran `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly with no issues. 2. For a large keyspace, I added the second datacenter ('dc2') with an `ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the log: ``` "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]" "org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 20739eb0-d92e-11e6-b42f-e7eb6f21c481)" "\tat org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_232]" "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_232]" "\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_232]" "\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]" "\tat org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.5.jar:3.11.5]" "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]" ``` I also saw this: ``` "ERROR 16:46:48 Configuration exception merging remote schema" "org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 20739eb0-d92e-11e6-b42f-e7eb6f21c481)" "\tat org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) ~[apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) [apache-cassandra-3.11.5.jar:3.11.5]" "\tat org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) [apache-cassandra-3.11.5.jar:3.11.5]" "\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_232]" "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_232]" "\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_232]" "\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]" "\tat org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.5.jar:3.11.5]" "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]" ``` This error repeated several times over the next 2 minutes. 3. While running `nodetool describecluster` repeatedly, I saw that the nodes in the 'dc2' datacenter converged to the new schema version quickly, but the nodes in the original datacenter ('dc1') remained at the previous schema version. 4. I waited to see if all of the nodes would converge to the new schema version, but they still hadn't converged after roughly 10 minutes. Given the errors I saw, I wasn't optimistic it would work out all by itself, so I decided to restart the nodes in the 'dc1' datacenter one at a time so they would restart with the latest schema version. 5. After each node restarted, `nodetool describecluster` showed it as being on the latest schema version. So, after getting through all the 'dc1' nodes, it looked like everything in the cluster was healthy again. 6. However, that's when I noticed that there were two data directories on disk for each table in the keyspace. New writes for a table were being saved in the newer directory, but queries for data saved in the old data directory were returning no results. 7. That's when I followed the recovery steps in the Datastax article with great success. ## Questions * My understanding is that running concurrent schema updates should always be avoided, since that can result in schema collisions. But, in this case, I wasn't performing multiple schema updates. I was just running a single `ALTER KEYSPACE` statement. Any idea why a single schema update would result in a schema collision and two data directories per table? * Should I have waited longer before restarting nodes? Perhaps, given enough time, the Cassandra nodes would have all converged on the correct schema version, and this would have resolved on it's own? * Any suggestions for how I can avoid this problem in the future? -- Tom Offermann Lead Software Engineer http://newrelic.com