Hello Jeff, Bowen

Thanks for your answer.
Now I understand that there is a bug in Cassandra that can not handle
concurrent schema modifications, I was not aware of that severity, I
thought that temporary schema mismatches were eventually resolved smartly,
by a kind of "merge" mechanism.
For my use cases, keyspaces and tables are created "on-demand", when
receiving exceptions for invalid KS or table on insert (then the KS and
table are created and the insert is retried). I can not afford to
centralize schema modifications in a bottleneck, but I can afford the data
inconsistencies, waiting for the fix in Cassandra.
I'm more worried about tombstones in system tables, I assume that 8
tombstones per day (or even more, but in the order of no more than some
dozens) is reasonable, can you confirm (or invalidate) that please?

Sébastien.

Le mer. 6 déc. 2023 à 03:00, Bowen Song via user <user@cassandra.apache.org>
a écrit :

> The same table name with two different CF IDs is not just "temporary
> schema disagreements", it's much worse than that. This breaks the eventual
> consistency guarantee, and leads to silent data corruption. It's silently
> happening in the background, and you don't realise it until you suddenly
> do, and then everything seems to blow up at the same time. You need to sort
> this out ASAP.
>
>
> On 05/12/2023 19:57, Sébastien Rebecchi wrote:
>
> Hi Bowen,
>
> Thanks for your answer.
>
> I was thinking of extreme use cases, but as far as I am concerned I can
> deal with creation and deletion of 2 tables every 6 hours for a keyspace.
> So it lets around 8 folders of deleted tables per day - sometimes more
> cause I can see sometimes 2 folders created for a same table name, with 2
> different ids, caused by temporary schema disagreements I guess.
> Basically it means 20 years before the KS folder has 65K subfolders, so I
> would say I have time to think of redesigning the data model ^^
> Nevertheless, does it sound too much in terms of thombstones in the
> systems tables (with the default GC grace period of 10 days)?
>
> Sébastien.
>
> Le mar. 5 déc. 2023, 12:19, Bowen Song via user <user@cassandra.apache.org>
> a écrit :
>
>> Please rethink your use case. Create and delete tables concurrently often
>> lead to schema disagreement. Even doing so on a single node sequentially
>> will lead to a large number of tombstones in the system tables.
>> On 04/12/2023 19:55, Sébastien Rebecchi wrote:
>>
>> Thank you Dipan.
>>
>> Do you know if there is a good reason for Cassandra to let tables folder
>> even when there is no snapshot?
>>
>> I'm thinking of use cases where there is the need to create and delete
>> small tables at a high rate. You could quickly end with more than 65K
>> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them
>> are residual of deleted tables.
>>
>> That looks quite dirty from Cassandra to not clean its own "garbage" by
>> itself, and quite dangerous for the end user to have to do it alone, don't
>> you think so?
>>
>> Thanks,
>>
>> Sébastien.
>>
>> Le lun. 4 déc. 2023, 11:28, Dipan Shah <dipan....@hotmail.com> a écrit :
>>
>>> Hello Sebastien,
>>>
>>> There are no inbuilt tools that will automatically remove folders of
>>> deleted tables.
>>>
>>> Thanks,
>>>
>>> Dipan Shah
>>> ------------------------------
>>> *From:* Sébastien Rebecchi <srebec...@kameleoon.com>
>>> *Sent:* 04 December 2023 13:54
>>> *To:* user@cassandra.apache.org <user@cassandra.apache.org>
>>> *Subject:* Remove folders of deleted tables
>>>
>>> Hello,
>>>
>>> When we delete a table with Cassandra, it lets the folder of that table
>>> on file system, even if there is no snapshot (auto snapshots disabled).
>>> So we end with the empty folder {data folder}/{keyspace name}/{table
>>> name-table id} containing only 1  subfolder, backups, which is itself empty.
>>> Is there a way to automatically remove folders of deleted tables?
>>>
>>> Sébastien.
>>>
>>

Reply via email to