We have a 4 nodes, two dc's test cluster. All of them have datastax enterprise 
installed and running. One dc is the Cassandra dc, and the other is the Solr 
dc. We first used sstableloader to stream 1 billion rows into the cluster. 
After that was done, we created a Solr core using resource auto-generation. It 
started indexing fine,  for a while, and then one of the solr nodes went down. 
And the system log shows this:

ERROR [NonPeriodicTasks:1] 2016-05-31 17:47:36,560  CassandraDaemon.java:229 - 
Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: Timeout while waiting for workers when flushing 
pool zootopia.ltb Index; current timeout is 300000 millis, consider increasing 
it, or reducing load on the node.
Failure to flush may cause excessive growth of Cassandra commit log.

        at 
com.datastax.bdp.search.solr.core.CassandraCoreContainer.doShutdown(CassandraCoreContainer.java:1081)
 ~[dse-search-4.8.7.jar:4.8.7]
        at 
com.datastax.bdp.search.solr.core.CassandraCoreContainer.access$100(CassandraCoreContainer.java:99)
 ~[dse-search-4.8.7.jar:4.8.7]
        at 
com.datastax.bdp.search.solr.core.CassandraCoreContainer$1.run(CassandraCoreContainer.java:626)
 ~[dse-search-4.8.7.jar:4.8.7]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_91]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[na:1.8.0_91]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_91]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.util.concurrent.TimeoutException: Timeout while waiting for 
workers when flushing pool zootopia.ltb Index; current timeout is 300000 
millis, consider increasing it, or reducing load on the node.
Failure to flush may cause excessive growth of Cassandra commit log.

        at com.datastax.bdp.concurrent.WorkPool.doFlushError(WorkPool.java:598) 
~[dse-core-4.8.7.jar:4.8.7]
        at 
com.datastax.bdp.concurrent.WorkPool.doTwoPhaseFlush(WorkPool.java:559) 
~[dse-core-4.8.7.jar:4.8.7]
        at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:523) 
~[dse-core-4.8.7.jar:4.8.7]
        at com.datastax.bdp.concurrent.WorkPool.shutdown(WorkPool.java:461) 
~[dse-core-4.8.7.jar:4.8.7]
        at 
com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.shutdownIndexUpdates(AbstractSolrSecondaryIndex.java:534)
 ~[dse-search-4.8.7.jar:4.8.7]
        at 
com.datastax.bdp.search.solr.core.CassandraCoreContainer.doShutdown(CassandraCoreContainer.java:1076)
 ~[dse-search-4.8.7.jar:4.8.7]
        ... 9 common frames omitted


When I restarted the solr nodes, the num of docs showed a very small number, 
much smaller than the last number we saw before it went down...

I saw this was a known issue in datastax 4.8 and was resolved in 4.8.1, but we 
are running 4.8.7...

Any ideas on whether it's still the same deadlock issue or something else?

Thanks,
Charles

Reply via email to