[
https://issues.apache.org/jira/browse/SOLR-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Potter resolved SOLR-9050.
----------------------------------
Resolution: Cannot Reproduce
> IndexFetcher not retrying after SocketTimeoutException correctly, which leads
> to trying a full download again
> -------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-9050
> URL: https://issues.apache.org/jira/browse/SOLR-9050
> Project: Solr
> Issue Type: Bug
> Components: replication (java)
> Affects Versions: 5.3.1
> Reporter: Timothy Potter
> Assignee: Timothy Potter
> Attachments: SOLR-9050.patch, SOLR-9050.patch
>
>
> I'm seeing a problem where reading a large file from the leader (in SolrCloud
> mode) during index replication leads to a SocketTimeoutException:
> {code}
> 2016-04-28 16:22:23.568 WARN (RecoveryThread-foo_shard11_replica2) [c:foo
> s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.IndexFetcher Error
> in fetching file: _405k.cfs (downloaded 7314866176 of 9990844536 bytes)
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
> at
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
> at
> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
> at
> org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
> at
> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
> at
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
> at
> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
> at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:140)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:167)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:161)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1312)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1275)
> at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
> {code}
> and this leads to the following error in cleanup:
> {code}
> 2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo
> s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.ReplicationHandler
> Index fetch failed :org.apache.solr.common.SolrException: Unable to download
> _405k.cfs completely. Downloaded 7314866176!=9990844536
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1406)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1286)
> at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:423)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> 2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo
> s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.c.RecoveryStrategy
> Error while trying to recover:org.apache.solr.common.SolrException:
> Replication for recovery failed.
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:165)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> {code}
> So a simple read timeout exception leads to re-downloading the whole index
> again, and again, and again ...
> It also looks like any exception raised in fetchPackets would be squelched if
> an exception is raised in cleanup (called in the finally block)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]