We have a 4 nodes, two dc's test cluster. All of them have datastax enterprise installed and running. One dc is the Cassandra dc, and the other is the Solr dc. We first used sstableloader to stream 1 billion rows into the cluster. After that was done, we created a Solr core using resource auto-generation. It started indexing fine, for a while, and then one of the solr nodes went down. And the system log shows this:
ERROR [NonPeriodicTasks:1] 2016-05-31 17:47:36,560 CassandraDaemon.java:229 - Exception in thread Thread[NonPeriodicTasks:1,5,main] java.lang.RuntimeException: Timeout while waiting for workers when flushing pool zootopia.ltb Index; current timeout is 300000 millis, consider increasing it, or reducing load on the node. Failure to flush may cause excessive growth of Cassandra commit log. at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doShutdown(CassandraCoreContainer.java:1081) ~[dse-search-4.8.7.jar:4.8.7] at com.datastax.bdp.search.solr.core.CassandraCoreContainer.access$100(CassandraCoreContainer.java:99) ~[dse-search-4.8.7.jar:4.8.7] at com.datastax.bdp.search.solr.core.CassandraCoreContainer$1.run(CassandraCoreContainer.java:626) ~[dse-search-4.8.7.jar:4.8.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_91] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] Caused by: java.util.concurrent.TimeoutException: Timeout while waiting for workers when flushing pool zootopia.ltb Index; current timeout is 300000 millis, consider increasing it, or reducing load on the node. Failure to flush may cause excessive growth of Cassandra commit log. at com.datastax.bdp.concurrent.WorkPool.doFlushError(WorkPool.java:598) ~[dse-core-4.8.7.jar:4.8.7] at com.datastax.bdp.concurrent.WorkPool.doTwoPhaseFlush(WorkPool.java:559) ~[dse-core-4.8.7.jar:4.8.7] at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:523) ~[dse-core-4.8.7.jar:4.8.7] at com.datastax.bdp.concurrent.WorkPool.shutdown(WorkPool.java:461) ~[dse-core-4.8.7.jar:4.8.7] at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.shutdownIndexUpdates(AbstractSolrSecondaryIndex.java:534) ~[dse-search-4.8.7.jar:4.8.7] at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doShutdown(CassandraCoreContainer.java:1076) ~[dse-search-4.8.7.jar:4.8.7] ... 9 common frames omitted When I restarted the solr nodes, the num of docs showed a very small number, much smaller than the last number we saw before it went down... I saw this was a known issue in datastax 4.8 and was resolved in 4.8.1, but we are running 4.8.7... Any ideas on whether it's still the same deadlock issue or something else? Thanks, Charles