I remember we had the same issue back in the Cassandra 2.x days, and
restarting the affected node only makes the issue go away temporarily.
The issue we had was "fixed" by adding
"-Dcassandra.max_queued_native_transport_requests=4096" to the JVM
options. I dug that option out from our old Ansible playbook. Now, after
so many years, I've long forgotten what does that option do.
Please seriously consider upgrade your Cassandra cluster to the least
version. I can't tell which exact version fixed this bug, but we had
removed this from our servers many years ago after several rounds of
upgrades, and we have not had the NTR pool blocking issue coming back.
On 23/03/2022 04:29, Jaydeep Chovatia wrote:
Hi,
I have been using Cassandra 3.0.14 in production for a long time.
Recently I have found a bug in that, all of a sudden the transport
thread-pool hangs.
*_Observation:_*
If I do /nodetool tpstats/, then it shows
/"Native-Transport-Requests"/ is blocking "Active" tasks. I stopped
the complete traffic, and sent a very light load, but still my
requests are getting denied, and active transport blocked tasks keep
happening.
_*Fix:*_
If I restart my cluster, then everything works fine, which means there
might be some deadlock, etc. in the system.
Is anyone aware of this issue? I know there have been quite a lot of
fixes on top of 3.0.14, is there any specific fix that addresses this
particular issue?
Any help would be appreciated.
Yours Sincerely,
Jaydeep