Re: Hints replays very slow in one DC

Krish Donald Thu, 27 Feb 2020 08:59:17 -0800

Thanks everyone for the response.
How to debug more on GC issue ?
Is there any GC issue which is present in 3.11.0 ?


On Thu, Feb 27, 2020 at 8:46 AM Reid Pinchback <rpinchb...@tripadvisor.com>
wrote:

> Our experience with G1GC was that 31gb wasn’t optimal (for us) because
> while you have less frequent full GCs they are bigger when they do happen.
> But even so, not to the point of a 9.5s full collection.
>
>
>
> Unless it is a rare event associated with something weird happening
> outside of the JVM (there are some whacky interactions between memory and
> dirty page writing that could cause it, but not typically), then that is
> evidence of a really tough fight to reclaim memory.  There are a lot of
> things that can impact garbage collection performance.  Something is either
> being pushed very hard, or something is being constrained very tightly
> compared to resource demand.
>
>
>
> I’m with Erick, I wouldn’t be putting my attention right now on anything
> but the GC issue. Everything else that happens within the JVM envelope is
> going to be a misread on timing until you have stable garbage collection.
> You might have other issues later, but you aren’t going to know what those
> are yet.
>
>
>
> One thing you could at least try to eliminate quickly as a factor.  Are
> repairs running at the time that things are slow?  In prior to 3.11.5 you
> lack one of the tuning knobs for doing a tradeoff on memory vs network
> bandwidth when doing repairs.
>
>
>
> I’d also make sure you have tuned C* to migrate whatever you reasonably
> can to be off-heap.
>
>
>
> Another thought for surprise demands on memory.  I don’t know if this is
> in 3.11.0, you’ll have to check the C* bash scripts for launching the
> service.  The number of malloc arenas haven’t always been curtailed, and
> that could result in an explosion in memory demand.  I just don’t recall
> where in C* version history that was addressed.
>
>
>
>
>
> *From: *Erick Ramirez <erick.rami...@datastax.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, February 26, 2020 at 9:55 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Hints replays very slow in one DC
>
>
>
> *Message from External Sender*
>
> Nodes are going down due to Out of Memory and we are using 31GB heap size
> in DC1 , however in DC2 (Which serves the traffic) has 16GB heap .
>
> Why we had to increase heap in DC1 is because , DC1 nodes were going down
> due Out of Memory issue but DC2 nodes never went down .
>
>
>
> It doesn't sound right that the primary DC is DC2 but DC1 is under load.
> You might not be aware of it but the symptom suggests DC1 is getting hit
> with lots of traffic. If you run netstat (or whatever utility/tool of
> your choice), you should see established connections to the cluster. That
> should give you clues as to where it's coming from.
>
>
>
> We also noticed below kind of messages in system.log
>
> FailureDetector.java:288 - Not marking nodes down due to local pause of
> 9532654114 > 5000000000
>
>
>
> That's another smoking gun that the nodes are buried in GC. A 9.5-second
> pause is significant. The slow hinted handoffs is really the least of your
> problem right now. If nodes weren't going down, there wouldn't be hints to
> handoff in the first place. Cheers!
>
>
>
> GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have
> answers! Share your expertise on https://community.datastax.com/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__community.datastax.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=C0gRic-Qm5s2TDaPnWIg9ki0Zfc99_sNxDDPBTS4Sqw&s=ts13dLS5C9fN0TvYJQmSKlqMnSHpS-j3blE22HMedsg&e=>
> .
>

Re: Hints replays very slow in one DC

Reply via email to