Re: Schema collision results in multiple data directories per table

vytenis silgalis Wed, 13 Oct 2021 10:22:48 -0700

You ran the `alter keyspace` command on the original dc1 nodes or the new
dc2 nodes?


On Wed, Oct 13, 2021 at 8:15 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Tom,
>
> while I am not completely sure what might cause your issue, I just
> want to highlight that schema agreements were overhauled in 4.0 (1) a
> lot so that may be somehow related to what that ticket was trying to
> fix.
>
> Regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-15158
>
> On Fri, 1 Oct 2021 at 18:43, Tom Offermann <tofferm...@newrelic.com>
> wrote:
> >
> > When adding a datacenter to a keyspace (following the Last Pickle [Data
> Center Switch][lp] playbook), I ran into a "Configuration exception merging
> remote schema" error. The nodes in one datacenter didn't converge to the
> new schema version, and after restarting them, I saw the symptoms described
> in this Datastax article on [Fixing a table schema collision][ds], where
> there were two data directories for each table in the keyspace on the nodes
> that didn't converge. I followed the recovery steps in the Datastax article
> to move the data from the older directories to the new directories, ran
> `nodetool refresh`, and that fixed the problem.
> >
> > [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
> > [ds]:
> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
> >
> > While the Datastax article was super helpful for helping me recover, I'm
> left wondering *why* this happened. If anyone can shed some light on that,
> or offer advice on how I can avoid getting in this situation in the future,
> I would be most appreciative. I'll describe the steps I took in more detail
> in the thread.
> >
> > ## Steps
> >
> > 1. The day before, I had added the second datacenter ('dc2') to the
> system_traces, system_distributed, and system_auth keyspaces and ran
> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly
> with no issues.
> >
> > 2. For a large keyspace, I added the second datacenter ('dc2') with an
> `ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy',
> 'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the
> log:
> >     ```
> >     "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_232]"
> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_232]"
> >     "\tat
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> >     ```
> >
> >     I also saw this:
> >     ```
> >     "ERROR 16:46:48 Configuration exception merging remote schema"
> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat 
> > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat 
> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_232]"
> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_232]"
> >     "\tat
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> >     ```
> >     This error repeated several times over the next 2 minutes.
> >
> > 3. While running `nodetool describecluster` repeatedly, I saw that the
> nodes in the 'dc2' datacenter converged to the new schema version quickly,
> but the nodes in the original datacenter ('dc1') remained at the previous
> schema version.
> >
> > 4. I waited to see if all of the nodes would converge to the new schema
> version, but they still hadn't converged after roughly 10 minutes. Given
> the errors I saw, I wasn't optimistic it would work out all by itself, so I
> decided to restart the nodes in the 'dc1' datacenter one at a time so they
> would restart with the latest schema version.
> >
> > 5. After each node restarted, `nodetool describecluster` showed it as
> being on the latest schema version. So, after getting through all the 'dc1'
> nodes, it looked like everything in the cluster was healthy again.
> >
> > 6. However, that's when I noticed that there were two data directories
> on disk for each table in the keyspace. New writes for a table were being
> saved in the newer directory, but queries for data saved in the old data
> directory were returning no results.
> >
> > 7. That's when I followed the recovery steps in the Datastax article
> with great success.
> >
> > ## Questions
> >
> > * My understanding is that running concurrent schema updates should
> always be avoided, since that can result in schema collisions. But, in this
> case, I wasn't performing multiple schema updates. I was just running a
> single `ALTER KEYSPACE` statement. Any idea why a single schema update
> would result in a schema collision and two data directories per table?
> >
> > * Should I have waited longer before restarting nodes? Perhaps, given
> enough time, the Cassandra nodes would have all converged on the correct
> schema version, and this would have resolved on it's own?
> >
> > * Any suggestions for how I can avoid this problem in the future?
> >
> > --
> > Tom Offermann
> > Lead Software Engineer
> > http://newrelic.com
>

Re: Schema collision results in multiple data directories per table

Reply via email to