Surbi: If you aren’t seeing connection activity in DC2, I’d check to see if the operations hitting DC1 are quorum ops instead of local quorum. That still wouldn’t explain DC2 nodes going down, but would at least explain them doing more work than might be on your radar right now.
The hint replay being slow to me sounds like you could be fighting GC. You mentioned bumping the DC2 nodes to 32gb. You might have already been doing this, but if not, be sure to be under 32gb, like 31gb. Otherwise you’re using larger object pointers and could actually have less effective ability to allocate memory. As the problem is only happening in DC2, then there has to be a thing that is true in DC2 that isn’t true in DC1. A difference in hardware, a difference in O/S version, a difference in networking config or physical infrastructure, a difference in client-triggered activity, or a difference in how repairs are handled. Somewhere, there is a difference. I’d start with focusing on that. From: Erick Ramirez <erick.rami...@datastax.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Saturday, April 4, 2020 at 8:28 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: OOM only on one datacenter nodes Message from External Sender With a lack of heapdump for you to analyse, my hypothesis is that your DC2 nodes are taking on traffic (from some client somewhere) but you're just not aware of it. The hints replay is just a side-effect of the nodes getting overloaded. To rule out my hypothesis in the first instance, my recommendation is to monitor the incoming connections to the nodes in DC2. If you don't have monitoring in place, you could simply run netstat at regular intervals and go from there. Cheers! GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have answers! Share your expertise on https://community.datastax.com/<https://urldefense.proofpoint.com/v2/url?u=https-3A__community.datastax.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=fGh0C8n6vv7LqulA6b4vVVTavsof7CZt6ESlDCr-uP8&s=oZjamTdUMswHVohvkHQQftZdYivh1qRAmRn1-dap_Uo&e=>.