After adding a 5-th node I started running NodeProbe:: forceKeyspaceCleanup. That function is not returning. Below I added the program stack trace information that shows that in this case and probably others cases it is possible that there is a deadlock
In my case, the compactionManager is calling a synchronized method getNextBackgroundTask that is calling getMaximalTask. And in my case that function is running an endless loop so the synchronized object is never released. And the request to cleanup is now trying to call the synchronized pause() function, so that function and thus the cleanup cannot proceed. Configuration : 5 node, v-nodes, replication 3, LCS, version 2.0.7 Daemon Thread [CompactionExecutor:72] (Suspended) owns: LeveledManifest (id=8722) owns: LeveledCompactionStrategy (id=8552) waited by: Daemon System Thread [RMI TCP Connection(14840)-10.164.8.73] (Suspended) LeveledManifest.getCompactionCandidates() line: 247 LeveledCompactionStrategy.getMaximalTask(int) line: 121 LeveledCompactionStrategy.getNextBackgroundTask(int) line: 113 CompactionManager$BackgroundCompactionTask.run() line: 191 Executors$RunnableAdapter<T>.call() line: 471 FutureTask<V>.run() line: 262 CompactionManager$CompactionExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745 Daemon System Thread [RMI TCP Connection(14840)-10.164.8.73] (Suspended) owns: ColumnFamilyStore (id=8551) waiting for: LeveledCompactionStrategy (id=8552) owned by: Daemon Thread [CompactionExecutor:72] (Suspended) LeveledCompactionStrategy(AbstractCompactionStrategy).pause() line: 112 ColumnFamilyStore.runWithCompactionsDisabled(Callable<V>, boolean) line: 2056 ColumnFamilyStore.markAllCompacting() line: 2125 CompactionManager.performAllSSTableOperation(ColumnFamilyStore, CompactionManager$AllSSTablesOperation) line: 214 CompactionManager.performCleanup(ColumnFamilyStore, CounterId$OneShotRenewer) line: 265 ColumnFamilyStore.forceCleanup(CounterId$OneShotRenewer) line: 1105 StorageService.forceKeyspaceCleanup(String, String...) line: 2215 GeneratedMethodAccessor42.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 Trampoline.invoke(Method, Object, Object[]) line: 75 GeneratedMethodAccessor19.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 MethodUtil.invoke(Method, Object, Object[]) line: 279 StandardMBeanIntrospector.invokeM2(Method, Object, Object[], Object) line: 112 StandardMBeanIntrospector.invokeM2(Object, Object, Object[], Object) line: 46 StandardMBeanIntrospector(MBeanIntrospector<M>).invokeM(M, Object, Object[], Object) line: 237 PerInterface<M>.invoke(Object, String, Object[], String[], Object) line: 138 StandardMBeanSupport(MBeanSupport<M>).invoke(String, Object[], String[]) line: 252 DefaultMBeanServerInterceptor.invoke(ObjectName, String, Object[], String[]) line: 819 JmxMBeanServer.invoke(ObjectName, String, Object[], String[]) line: 801 RMIConnectionImpl.doOperation(int, Object[]) line: 1487 RMIConnectionImpl.access$300(RMIConnectionImpl, int, Object[]) line: 97 RMIConnectionImpl$PrivilegedOperation.run() line: 1328 RMIConnectionImpl.doPrivilegedOperation(int, Object[], Subject) line: 1420 RMIConnectionImpl.invoke(ObjectName, String, MarshalledObject, String[], Subject) line: 848 GeneratedMethodAccessor41.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 UnicastServerRef.dispatch(Remote, RemoteCall) line: 322 Transport$1.run() line: 177 Transport$1.run() line: 174 AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext) line: not available [native method] TCPTransport(Transport).serviceCall(RemoteCall) line: 173 TCPTransport.handleMessages(Connection, boolean) line: 556 TCPTransport$ConnectionHandler.run0() line: 811 TCPTransport$ConnectionHandler.run() line: 670 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 ThreadPoolExecutor$Worker.run() line: 615 Thread.run() line: 745