Re: Schema collision results in multiple data directories per table

Stefan Miklosovic Wed, 13 Oct 2021 06:15:26 -0700

Hi Tom,

while I am not completely sure what might cause your issue, I just
want to highlight that schema agreements were overhauled in 4.0 (1) a
lot so that may be somehow related to what that ticket was trying to
fix.


Regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-15158

On Fri, 1 Oct 2021 at 18:43, Tom Offermann <tofferm...@newrelic.com> wrote:
>
> When adding a datacenter to a keyspace (following the Last Pickle [Data 
> Center Switch][lp] playbook), I ran into a "Configuration exception merging 
> remote schema" error. The nodes in one datacenter didn't converge to the new 
> schema version, and after restarting them, I saw the symptoms described in 
> this Datastax article on [Fixing a table schema collision][ds], where there 
> were two data directories for each table in the keyspace on the nodes that 
> didn't converge. I followed the recovery steps in the Datastax article to 
> move the data from the older directories to the new directories, ran 
> `nodetool refresh`, and that fixed the problem.
>
> [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
> [ds]: 
> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
>
> While the Datastax article was super helpful for helping me recover, I'm left 
> wondering *why* this happened. If anyone can shed some light on that, or 
> offer advice on how I can avoid getting in this situation in the future, I 
> would be most appreciative. I'll describe the steps I took in more detail in 
> the thread.
>
> ## Steps
>
> 1. The day before, I had added the second datacenter ('dc2') to the 
> system_traces, system_distributed, and system_auth keyspaces and ran 
> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly 
> with no issues.
>
> 2. For a large keyspace, I added the second datacenter ('dc2') with an `ALTER 
> KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 
> '2', 'dc2': '3'};` statement. Immediately, I saw this error in the log:
>     ```
>     "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
>     "org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
>     "\tat 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_232]"
>     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_232]"
>     "\tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_232]"
>     "\tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_232]"
>     "\tat 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>  [apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
>     ```
>
>     I also saw this:
>     ```
>     "ERROR 16:46:48 Configuration exception merging remote schema"
>     "org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected 
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
>     "\tat 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
>  ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91) 
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
>  [apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> [apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_232]"
>     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_232]"
>     "\tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_232]"
>     "\tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_232]"
>     "\tat 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>  [apache-cassandra-3.11.5.jar:3.11.5]"
>     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
>     ```
>     This error repeated several times over the next 2 minutes.
>
> 3. While running `nodetool describecluster` repeatedly, I saw that the nodes 
> in the 'dc2' datacenter converged to the new schema version quickly, but the 
> nodes in the original datacenter ('dc1') remained at the previous schema 
> version.
>
> 4. I waited to see if all of the nodes would converge to the new schema 
> version, but they still hadn't converged after roughly 10 minutes. Given the 
> errors I saw, I wasn't optimistic it would work out all by itself, so I 
> decided to restart the nodes in the 'dc1' datacenter one at a time so they 
> would restart with the latest schema version.
>
> 5. After each node restarted, `nodetool describecluster` showed it as being 
> on the latest schema version. So, after getting through all the 'dc1' nodes, 
> it looked like everything in the cluster was healthy again.
>
> 6. However, that's when I noticed that there were two data directories on 
> disk for each table in the keyspace. New writes for a table were being saved 
> in the newer directory, but queries for data saved in the old data directory 
> were returning no results.
>
> 7. That's when I followed the recovery steps in the Datastax article with 
> great success.
>
> ## Questions
>
> * My understanding is that running concurrent schema updates should always be 
> avoided, since that can result in schema collisions. But, in this case, I 
> wasn't performing multiple schema updates. I was just running a single `ALTER 
> KEYSPACE` statement. Any idea why a single schema update would result in a 
> schema collision and two data directories per table?
>
> * Should I have waited longer before restarting nodes? Perhaps, given enough 
> time, the Cassandra nodes would have all converged on the correct schema 
> version, and this would have resolved on it's own?
>
> * Any suggestions for how I can avoid this problem in the future?
>
> --
> Tom Offermann
> Lead Software Engineer
> http://newrelic.com

Re: Schema collision results in multiple data directories per table

Reply via email to