Re: Schema collision results in multiple data directories per table

Tom Offermann Fri, 15 Oct 2021 14:21:50 -0700

Vytenis,

I ran the `ALTER KEYSPACE` command on one of the original `dc1` nodes.


Should it make any difference? My understanding was that it could be run
from any node in either datacenter. But, if there's a reason to prefer
running it on a new datacenter node, I'm happy to do it that way.

--Tom

On Wed, Oct 13, 2021 at 10:22 AM vytenis silgalis <vsilga...@gmail.com>
wrote:

> You ran the `alter keyspace` command on the original dc1 nodes or the new
> dc2 nodes?
>
> On Wed, Oct 13, 2021 at 8:15 AM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
>> Hi Tom,
>>
>> while I am not completely sure what might cause your issue, I just
>> want to highlight that schema agreements were overhauled in 4.0 (1) a
>> lot so that may be somehow related to what that ticket was trying to
>> fix.
>>
>> Regards
>>
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-15158
>>
>> On Fri, 1 Oct 2021 at 18:43, Tom Offermann <tofferm...@newrelic.com>
>> wrote:
>> >
>> > When adding a datacenter to a keyspace (following the Last Pickle [Data
>> Center Switch][lp] playbook), I ran into a "Configuration exception merging
>> remote schema" error. The nodes in one datacenter didn't converge to the
>> new schema version, and after restarting them, I saw the symptoms described
>> in this Datastax article on [Fixing a table schema collision][ds], where
>> there were two data directories for each table in the keyspace on the nodes
>> that didn't converge. I followed the recovery steps in the Datastax article
>> to move the data from the older directories to the new directories, ran
>> `nodetool refresh`, and that fixed the problem.
>> >
>> > [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>> > [ds]:
>> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
>> >
>> > While the Datastax article was super helpful for helping me recover,
>> I'm left wondering *why* this happened. If anyone can shed some light on
>> that, or offer advice on how I can avoid getting in this situation in the
>> future, I would be most appreciative. I'll describe the steps I took in
>> more detail in the thread.
>> >
>> > ## Steps
>> >
>> > 1. The day before, I had added the second datacenter ('dc2') to the
>> system_traces, system_distributed, and system_auth keyspaces and ran
>> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly
>> with no issues.
>> >
>> > 2. For a large keyspace, I added the second datacenter ('dc2') with an
>> `ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy',
>> 'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the
>> log:
>> >     ```
>> >     "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
>> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
>> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
>> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
>> >     "\tat
>> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_232]"
>> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_232]"
>> >     "\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_232]"
>> >     "\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [na:1.8.0_232]"
>> >     "\tat
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>> [apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
>> >     ```
>> >
>> >     I also saw this:
>> >     ```
>> >     "ERROR 16:46:48 Configuration exception merging remote schema"
>> >     "org.apache.cassandra.exceptions.ConfigurationException: Column
>> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
>> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
>> >     "\tat
>> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:91)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat 
>> > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
>> [apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat 
>> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>> [apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_232]"
>> >     "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_232]"
>> >     "\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [na:1.8.0_232]"
>> >     "\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [na:1.8.0_232]"
>> >     "\tat
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>> [apache-cassandra-3.11.5.jar:3.11.5]"
>> >     "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
>> >     ```
>> >     This error repeated several times over the next 2 minutes.
>> >
>> > 3. While running `nodetool describecluster` repeatedly, I saw that the
>> nodes in the 'dc2' datacenter converged to the new schema version quickly,
>> but the nodes in the original datacenter ('dc1') remained at the previous
>> schema version.
>> >
>> > 4. I waited to see if all of the nodes would converge to the new schema
>> version, but they still hadn't converged after roughly 10 minutes. Given
>> the errors I saw, I wasn't optimistic it would work out all by itself, so I
>> decided to restart the nodes in the 'dc1' datacenter one at a time so they
>> would restart with the latest schema version.
>> >
>> > 5. After each node restarted, `nodetool describecluster` showed it as
>> being on the latest schema version. So, after getting through all the 'dc1'
>> nodes, it looked like everything in the cluster was healthy again.
>> >
>> > 6. However, that's when I noticed that there were two data directories
>> on disk for each table in the keyspace. New writes for a table were being
>> saved in the newer directory, but queries for data saved in the old data
>> directory were returning no results.
>> >
>> > 7. That's when I followed the recovery steps in the Datastax article
>> with great success.
>> >
>> > ## Questions
>> >
>> > * My understanding is that running concurrent schema updates should
>> always be avoided, since that can result in schema collisions. But, in this
>> case, I wasn't performing multiple schema updates. I was just running a
>> single `ALTER KEYSPACE` statement. Any idea why a single schema update
>> would result in a schema collision and two data directories per table?
>> >
>> > * Should I have waited longer before restarting nodes? Perhaps, given
>> enough time, the Cassandra nodes would have all converged on the correct
>> schema version, and this would have resolved on it's own?
>> >
>> > * Any suggestions for how I can avoid this problem in the future?
>> >
>> > --
>> > Tom Offermann
>> > Lead Software Engineer
>> > http://newrelic.com
>>
>

-- 
Tom Offermann
Lead Software Engineer
http://newrelic.com

Re: Schema collision results in multiple data directories per table

Reply via email to