Surbi:

If you aren’t seeing connection activity in DC2, I’d check to see if the 
operations hitting DC1 are quorum ops instead of local quorum.  That still 
wouldn’t explain DC2 nodes going down, but would at least explain them doing 
more work than might be on your radar right now.

The hint replay being slow to me sounds like you could be fighting GC.

You mentioned bumping the DC2 nodes to 32gb.  You might have already been doing 
this, but if not, be sure to be under 32gb, like 31gb.  Otherwise you’re using 
larger object pointers and could actually have less effective ability to 
allocate memory.

As the problem is only happening in DC2, then there has to be a thing that is 
true in DC2 that isn’t true in DC1.  A difference in hardware, a difference in 
O/S version, a difference in networking config or physical infrastructure, a 
difference in client-triggered activity, or a difference in how repairs are 
handled. Somewhere, there is a difference.  I’d start with focusing on that.


From: Erick Ramirez <erick.rami...@datastax.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Saturday, April 4, 2020 at 8:28 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: OOM only on one datacenter nodes

Message from External Sender
With a lack of heapdump for you to analyse, my hypothesis is that your DC2 
nodes are taking on traffic (from some client somewhere) but you're just not 
aware of it. The hints replay is just a side-effect of the nodes getting 
overloaded.

To rule out my hypothesis in the first instance, my recommendation is to 
monitor the incoming connections to the nodes in DC2. If you don't have 
monitoring in place, you could simply run netstat at regular intervals and go 
from there. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have 
answers! Share your expertise on 
https://community.datastax.com/<https://urldefense.proofpoint.com/v2/url?u=https-3A__community.datastax.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=fGh0C8n6vv7LqulA6b4vVVTavsof7CZt6ESlDCr-uP8&s=oZjamTdUMswHVohvkHQQftZdYivh1qRAmRn1-dap_Uo&e=>.

Reply via email to