Hello Jeff, Bowen Thanks for your answer. Now I understand that there is a bug in Cassandra that can not handle concurrent schema modifications, I was not aware of that severity, I thought that temporary schema mismatches were eventually resolved smartly, by a kind of "merge" mechanism. For my use cases, keyspaces and tables are created "on-demand", when receiving exceptions for invalid KS or table on insert (then the KS and table are created and the insert is retried). I can not afford to centralize schema modifications in a bottleneck, but I can afford the data inconsistencies, waiting for the fix in Cassandra. I'm more worried about tombstones in system tables, I assume that 8 tombstones per day (or even more, but in the order of no more than some dozens) is reasonable, can you confirm (or invalidate) that please?
Sébastien. Le mer. 6 déc. 2023 à 03:00, Bowen Song via user <user@cassandra.apache.org> a écrit : > The same table name with two different CF IDs is not just "temporary > schema disagreements", it's much worse than that. This breaks the eventual > consistency guarantee, and leads to silent data corruption. It's silently > happening in the background, and you don't realise it until you suddenly > do, and then everything seems to blow up at the same time. You need to sort > this out ASAP. > > > On 05/12/2023 19:57, Sébastien Rebecchi wrote: > > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it lets around 8 folders of deleted tables per day - sometimes more > cause I can see sometimes 2 folders created for a same table name, with 2 > different ids, caused by temporary schema disagreements I guess. > Basically it means 20 years before the KS folder has 65K subfolders, so I > would say I have time to think of redesigning the data model ^^ > Nevertheless, does it sound too much in terms of thombstones in the > systems tables (with the default GC grace period of 10 days)? > > Sébastien. > > Le mar. 5 déc. 2023, 12:19, Bowen Song via user <user@cassandra.apache.org> > a écrit : > >> Please rethink your use case. Create and delete tables concurrently often >> lead to schema disagreement. Even doing so on a single node sequentially >> will lead to a large number of tombstones in the system tables. >> On 04/12/2023 19:55, Sébastien Rebecchi wrote: >> >> Thank you Dipan. >> >> Do you know if there is a good reason for Cassandra to let tables folder >> even when there is no snapshot? >> >> I'm thinking of use cases where there is the need to create and delete >> small tables at a high rate. You could quickly end with more than 65K >> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them >> are residual of deleted tables. >> >> That looks quite dirty from Cassandra to not clean its own "garbage" by >> itself, and quite dangerous for the end user to have to do it alone, don't >> you think so? >> >> Thanks, >> >> Sébastien. >> >> Le lun. 4 déc. 2023, 11:28, Dipan Shah <dipan....@hotmail.com> a écrit : >> >>> Hello Sebastien, >>> >>> There are no inbuilt tools that will automatically remove folders of >>> deleted tables. >>> >>> Thanks, >>> >>> Dipan Shah >>> ------------------------------ >>> *From:* Sébastien Rebecchi <srebec...@kameleoon.com> >>> *Sent:* 04 December 2023 13:54 >>> *To:* user@cassandra.apache.org <user@cassandra.apache.org> >>> *Subject:* Remove folders of deleted tables >>> >>> Hello, >>> >>> When we delete a table with Cassandra, it lets the folder of that table >>> on file system, even if there is no snapshot (auto snapshots disabled). >>> So we end with the empty folder {data folder}/{keyspace name}/{table >>> name-table id} containing only 1 subfolder, backups, which is itself empty. >>> Is there a way to automatically remove folders of deleted tables? >>> >>> Sébastien. >>> >>