[
https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457273#comment-16457273
]
Tomás Fernández Löbbe commented on SOLR-11881:
----------------------------------------------
If I’m reading the code correctly, this RetryHandler is called in two cases,
after an exception trying to establish the connection, and after an exception
executing the request. Retrying in the first case should be fine, in the
second, not so easy.
The way we do streaming is by keeping the connection and having an
{{EntityTemplate}} that reads updates from a queue and writes each one and
flushes. If I understand correctly, if we want to retry the request instead of
throwing an error we need to retry the update too, by putting it back in the
queue. Even then, I’m not sure we can know that updates before the failure were
consumed (even if we call flush for each one)
> Connection Reset Causing LIR
> ----------------------------
>
> Key: SOLR-11881
> URL: https://issues.apache.org/jira/browse/SOLR-11881
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
> Assignee: Varun Thacker
> Priority: Major
> Attachments: SOLR-11881.patch, SOLR-11881.patch
>
>
> We can see that a connection reset is causing LIR.
> If a leader -> replica update get's a connection like this the leader will
> initiate LIR
> {code}
> 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:person s:shard7_1
> r:core_node56 x:person_shard7_1_replica1]
> o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on
> replica https://host08.domain:8985/solr/person_shard7_1_replica2/
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:210)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
> at
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy
> working SolrCloud cluster, even a rare response like this from a replica can
> cause a recovery and heavy cluster disruption" .
> Looking at SOLR-6931 we added a http retry handler but we only retry on GET
> requests. Updates are POST requests
> {{ConcurrentUpdateSolrClient#sendUpdateStream}}
> Update requests between the leader and replica should be retry-able since
> they have been versioned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]