Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Erick Ramirez
I agree with Jeff that this isn't related to ALTER TABLE. FWIW, the original table was created in 2017 but a new version got created on August 5: - 20739eb0-d92e-11e6-b42f-e7eb6f21c481 - Friday, January 13, 2017 at 1:18:01 GMT - 8ad72660-f629-11eb-a217-e1a09d8bc60c - Thursday, August 5, 2

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Jeff, Ahh...I see. That makes sense. I'll add this to the list of things to check before making a schema change. Thanks so much for taking the time to walk me through this. Really appreciate all of your help! On Fri, Oct 15, 2021 at 3:52 PM Jeff Jirsa wrote: > Consistency doesnt matter for sch

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
Consistency doesnt matter for schema. For every host: " select id from system_schema tables WHERE keyspace_name=? and table_name=?" ( https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L144 ) Then, compare that to the /path/to/data/key

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
So, if I were to do `CONSISTENCY ALL; select *` from each of the system_schema tables, then on-disk and in-memory should be in sync? On Fri, Oct 15, 2021 at 3:38 PM Jeff Jirsa wrote: > Heap dumps + filesystem inspection + SELECT from schema tables. > > > On Fri, Oct 15, 2021 at 3:28 PM Tom Offer

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
Heap dumps + filesystem inspection + SELECT from schema tables. On Fri, Oct 15, 2021 at 3:28 PM Tom Offermann wrote: > Interesting! > > Is there a way to determine if the on-disk schema and the in-memory schema > are in sync? Is there a way to force them to sync? If so, would it help to > force

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Interesting! Is there a way to determine if the on-disk schema and the in-memory schema are in sync? Is there a way to force them to sync? If so, would it help to force a sync before running an `ALTER KEYSPACE` schema change? On Fri, Oct 15, 2021 at 3:08 PM Jeff Jirsa wrote: > I would not expec

Re: update cassandra.yaml file on number of cluster nodes

2021-10-15 Thread Bowen Song
We have Cassandra on bare-metal servers, and we manage our servers via Ansible. In this use case, we create an Ansible playbook to update the servers one by one, change the cassandra.yaml file, restart Cassandra, and wait for it to finish the restart, and then wait for a few minutes before movi

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
I would not expect an ALTER KEYSPACE to introduce a divergent CFID, that usually happens during a CREATE TABLE. With no other evidence or ability to debug, I would guess that the CFIDs diverged previously, but due to the race(s) I described, the on-disk schema and the in-memory schema differed, and

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Jeff, Thanks for describing the race condition. I understand that performing concurrent schema changes is dangerous, and that running an `ALTER KEYSPACE` on one node, and then running another `ALTER KEYSPACE` on a different node, before the first has fully propagated throughout the cluster, can l

update cassandra.yaml file on number of cluster nodes

2021-10-15 Thread ZAIDI, ASAD
Hello Folks, Can you guys please suggest tool or approach to update cassandra.yaml file in multi-dc environment with large number of nodes efficiently. Thank you. Asad

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Vytenis, I ran the `ALTER KEYSPACE` command on one of the original `dc1` nodes. Should it make any difference? My understanding was that it could be run from any node in either datacenter. But, if there's a reason to prefer running it on a new datacenter node, I'm happy to do it that way. --Tom

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Stefan, Yes, this is probably one of many good reasons to upgrade! Upgrading to Cassandra 4.0 is definitely on our roadmap, but we're hoping to do these migrations first before we upgrade. However, if we keep running into this problem, we may have to rethink that ordering. --Tom On Wed, Oct 13

Re: TWCS not cleaning up data as fast as expected

2021-10-15 Thread Bowen Song
I noticed the table default TTL is 1 day, but the SSTable's max TTL is 3 days. Any idea why would this happen? Does any INSERT/UPDATE statement have a TTL longer than the table's default? The min timestamp is also very odd, it's in 2017. Do you insert data using very old timestamps? On 15/10/2