Hi,
I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out
exactly how much throughput my cluster can handle.
Consistently in my test I see a replica go into recovering state forever caused
by what looks like a timeout during replication. I can understand the timeout
and failure (I am hitting it fairly hard) but what seems odd to me is that when
I stop the heavy load it still does not recover the next time it tries, it
seems broken forever until I manually go in, clear the index and let it do a
full resync.
Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2
shards, 2 replicas) (AWS m3.2xlarge). I am indexing with ~800 concurrent
connections and a 10 sec soft commit. I consistently get this problem with a
throughput of around 1.5 million documents per hour.
Thanks all,
Darren
Stack Traces & Messages:
[qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â
null:org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool
at
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Error while trying to recover.
core=assets_shard2_replica1:java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://xxx.xxx.15.171:8080/solr
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
occured when talking to server at: http://xxx.xxx.15.171:8080/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:452)
... 6 more
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â
Recovery failed - trying again... (0) core=assets_shard2_replica1
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â
Recovery failed - interrupted. core=assets_shard2_replica1
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â
Recovery failed - I give up. core=assets_shard2_replica1
853918 [RecoveryThread] WARN org.apache.solr.cloud.RecoveryStrategy â
Stopping recovery for
zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1
853933 [Thread-382] WARN org.apache.solr.cloud.RecoveryStrategy â Stopping
recovery for
zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1