Re: R/W timeouts VS number of tables in keyspace

Scott Hirleman Thu, 22 Jul 2021 16:01:14 -0700

I feel like that calls for an anti-pattern -> success blog post Luca 🤣


On Tue, Jul 20, 2021 at 9:17 AM Luca Rondanini <luca.rondan...@gmail.com>
wrote:

> Thanks Sean,
>
> I'm switching to G1 in order to gain some time while refactoring. I should
> be able to go down to 4 tables! Yes, the original design was that poor.
>
> Thanks again
>
> On Tue, Jul 20, 2021 at 6:41 AM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> Each table in the cluster will have a memtable. This is why you do not
>> want to fracture the memory into 900+ slices. The rule of thumb I have
>> followed is to stay in the low hundreds (maybe 200) tables for the whole
>> cluster. I would be requiring the hard refactoring (or moving tables to
>> different clusters) immediately, since you really need to reduce by at
>> least 700 tables. You are seeing the memory impacts.
>>
>>
>>
>> In addition, in my experience, CMS is much harder to tune. G1GC works
>> well in my use cases without much tuning (or Java-guru level knowledge).
>> However, I don’t think that you will be able to engineer around the 900+
>> tables, no matter which GC you use.
>>
>>
>>
>> Sean Durity – Staff Systems Engineer, Cassandra
>>
>>
>>
>> *From:* Luca Rondanini <luca.rondan...@gmail.com>
>> *Sent:* Monday, July 19, 2021 11:34 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] R/W timeouts VS number of tables in keyspace
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I have a keyspace with almost 900 tables.
>>
>>
>>
>> Lately I started receiving lots of w/r timeouts (eg
>> com.datastax.driver.core.exceptions.Read/WriteTimeoutException: Cassandra
>> timeout during write query at consistency LOCAL_ONE (1 replica were
>> required but only 0 acknowledged the write).
>>
>>
>>
>> *I'm even experiencing nodes crashing.*
>>
>>
>>
>> In the logs I get many warnings like:
>>
>>
>>
>> WARN  [Service Thread]....GCInspector.java:282 - ConcurrentMarkSweep GC
>> in 4025ms.  CMS Old Ge
>> n: 2141569800 -> 2116170568; Par Eden Space: 167772160 -> 0; Par Survivor
>> Space: 20971520 -> 0
>>
>>
>> WARN  [GossipTasks:1].....FailureDetector.java:288 - Not marking nodes
>> down due to local pause
>> of 5038005208 > 5000000000
>>
>> I know 900 tables is a design error for C* but before a super painful
>> refactoring I'd like to rule out any configuration problem. Any suggestion?
>>
>>
>>
>> Thanks a lot,
>>
>> Luca
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

-- 
Scott Hirleman
scott.hirle...@gmail.com

Re: R/W timeouts VS number of tables in keyspace

Reply via email to