Re: Schema collision results in multiple data directories per table

Tom Offermann Fri, 15 Oct 2021 14:20:10 -0700

Stefan,

Yes, this is probably one of many good reasons to upgrade!

Upgrading to Cassandra 4.0 is definitely on our roadmap, but we're hoping
to do these migrations first before we upgrade.

However, if we keep running into this problem, we may have to rethink that
ordering.

--Tom

On Wed, Oct 13, 2021 at 6:15 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Tom,
>
> while I am not completely sure what might cause your issue, I just
> want to highlight that schema agreements were overhauled in 4.0 (1) a
> lot so that may be somehow related to what that ticket was trying to
> fix.
>
> Regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-15158
>
> On Fri, 1 Oct 2021 at 18:43, Tom Offermann <tofferm...@newrelic.com>
> wrote:
> >
> > When adding a datacenter to a keyspace (following the Last Pickle [Data
> Center Switch][lp] playbook), I ran into a "Configuration exception merging
> remote schema" error. The nodes in one datacenter didn't converge to the
> new schema version, and after restarting them, I saw the symptoms described
> in this Datastax article on [Fixing a table schema collision][ds], where
> there were two data directories for each table in the keyspace on the nodes
> that didn't converge. I followed the recovery steps in the Datastax article
> to move the data from the older directories to the new directories, ran
> `nodetool refresh`, and that fixed the problem.
> >
> > [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
> > [ds]:
> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
> >
> > While the Datastax article was super helpful for helping me recover, I'm
> left wondering *why* this happened. If anyone can shed some light on that,
> or offer advice on how I can avoid getting in this situation in the future,
> I would be most appreciative. I'll describe the steps I took in more detail
> in the thread.
> >
> > ## Steps
> >
> > 1. The day before, I had added the second datacenter ('dc2') to the
> system_traces, system_distributed, and system_auth keyspaces and ran
> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly
> with no issues.
> >
> > 2. For a large keyspace, I added the second datacenter ('dc2') with an
> `ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy',
> 'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the
> log:
> >     ```
> >     "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_232]"
> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_232]"
> >     "\tat
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> >     ```
> >
> >     I also saw this:
> >     ```
> >     "ERROR 16:46:48 Configuration exception merging remote schema"
> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat 
> > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat 
> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_232]"
> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_232]"
> >     "\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_232]"
> >     "\tat
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> >     ```
> >     This error repeated several times over the next 2 minutes.
> >
> > 3. While running `nodetool describecluster` repeatedly, I saw that the
> nodes in the 'dc2' datacenter converged to the new schema version quickly,
> but the nodes in the original datacenter ('dc1') remained at the previous
> schema version.
> >
> > 4. I waited to see if all of the nodes would converge to the new schema
> version, but they still hadn't converged after roughly 10 minutes. Given
> the errors I saw, I wasn't optimistic it would work out all by itself, so I
> decided to restart the nodes in the 'dc1' datacenter one at a time so they
> would restart with the latest schema version.
> >
> > 5. After each node restarted, `nodetool describecluster` showed it as
> being on the latest schema version. So, after getting through all the 'dc1'
> nodes, it looked like everything in the cluster was healthy again.
> >
> > 6. However, that's when I noticed that there were two data directories
> on disk for each table in the keyspace. New writes for a table were being
> saved in the newer directory, but queries for data saved in the old data
> directory were returning no results.
> >
> > 7. That's when I followed the recovery steps in the Datastax article
> with great success.
> >
> > ## Questions
> >
> > * My understanding is that running concurrent schema updates should
> always be avoided, since that can result in schema collisions. But, in this
> case, I wasn't performing multiple schema updates. I was just running a
> single `ALTER KEYSPACE` statement. Any idea why a single schema update
> would result in a schema collision and two data directories per table?
> >
> > * Should I have waited longer before restarting nodes? Perhaps, given
> enough time, the Cassandra nodes would have all converged on the correct
> schema version, and this would have resolved on it's own?
> >
> > * Any suggestions for how I can avoid this problem in the future?
> >
> > --
> > Tom Offermann
> > Lead Software Engineer
> > http://newrelic.com
>


-- 
Tom Offermann
Lead Software Engineer
http://newrelic.com

Re: Schema collision results in multiple data directories per table

Reply via email to