[
https://issues.apache.org/jira/browse/SOLR-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cao Manh Dat updated SOLR-9945:
-------------------------------
Description:
When a node is recovering, the leader can meet an exception when trying to send
an update to the buffering node. So the leader will try running LIR process:
first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time
to see the node's state is RECOVERING.
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
.createCollection(collection, "config", 1, 2)
.setMaxShardsPerNode(1)
.process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
cluster.getSolrClient().getZkStateReader(),
false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();
DocCollection docCollection =
cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica !=
slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);
new UpdateRequest()
.add(sdoc("id", "1"))
.process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
.add(sdoc("id", "2"))
.process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
.add(sdoc("id", "3"))
.process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
cluster.getSolrClient().getZkStateReader(),
false, true, 60);
CollectionAdminRequest
.deleteCollection(collection)
.process(cloudClient);
{code}
Here are some log
{code}
10942 INFO (searcherExecutor-53-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.SolrCore [collection1_shard2_replica2]
Registered new searcher Searcher@510d1ea5[collection1_shard2_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
10942 INFO (searcherExecutor-56-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node3)
[n:127.0.0.1:36014_solr c:collection1 s:shard1 r:core_node3
x:collection1_shard1_replica2] o.a.s.c.SolrCore [collection1_shard1_replica2]
Registered new searcher Searcher@505d7839[collection1_shard1_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
10958 INFO (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.ZkController Core needs to
recover:collection1_shard2_replica2
10964 INFO (updateExecutor-20-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
10970 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Starting recovery
process. recoveringAfterStartup=true
10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy ######
startupVersions=[[-1556034709770731520]]
10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Begin buffering
updates. core=[collection1_shard2_replica2]
10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.u.UpdateLog Starting to buffer updates.
FSUpdateLog{state=ACTIVE, tlog=null}
10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Publishing state of
core [collection1_shard2_replica2] as recovering, leader is
[http://127.0.0.1:42549/solr/collection1_shard2_replica1/] and I am
[http://127.0.0.1:36014/solr/collection1_shard2_replica2/]
11129 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
[n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery
command to [http://127.0.0.1:42549/solr]; [WaitForState:
action=PREPRECOVERY&core=collection1_shard2_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node4&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
11131 INFO (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr)
[n:127.0.0.1:42549_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
[WatchedEvent state:SyncConnected type:NodeDataChanged
path:/collections/collection1/state.json] for collection [collection1] has
occurred - updating... (live nodes size: [2])
11131 INFO (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
[WatchedEvent state:SyncConnected type:NodeDataChanged
path:/collections/collection1/state.json] for collection [collection1] has
occurred - updating... (live nodes size: [2])
11133 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node4, state:
recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see
collection1_shard2_replica1 (shard2 of collection1) have state: recovering
11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection1,
shard=shard2, thisCore=collection1_shard2_replica1,
leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
currentState=recovering, localState=active, nodeName=127.0.0.1:36014_solr,
coreNodeName=core_node4, onlyIfActiveCheckResult=false, nodeProps:
core_node4:{"core":"collection1_shard2_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"recovering"}
11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp Waited coreNodeName: core_node4, state: recovering,
checkLive: true, onlyIfLeader: true for: 0 seconds.
11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/cores
params={nodeName=127.0.0.1:36014_solr&onlyIfLeaderActive=true&core=collection1_shard2_replica1&coreNodeName=core_node4&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2}
status=0 QTime=7
11160 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening
[Searcher@28717b92[collection2_shard1_replica2] main]
11161 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Configured
ZooKeeperStorageIO with znodeBase: /configs/config
11162 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Loaded null at
path _rest_managed.json using ZooKeeperStorageIO:path=/configs/config
11162 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.h.ReplicationHandler Commits will be
reserved for 10000
11162 INFO (searcherExecutor-54-thread-1-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.SolrCore [collection2_shard1_replica2]
Registered new searcher Searcher@28717b92[collection2_shard1_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C1)))}
11168 INFO (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.ZkController Core needs to
recover:collection2_shard1_replica2
11168 INFO (updateExecutor-20-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
11168 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Starting recovery
process. recoveringAfterStartup=true
11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy ######
startupVersions=[[1556034713508904960]]
11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Begin buffering
updates. core=[collection2_shard1_replica2]
11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.u.UpdateLog Starting to buffer updates.
FSUpdateLog{state=ACTIVE, tlog=null}
11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Publishing state of
core [collection2_shard1_replica2] as recovering, leader is
[http://127.0.0.1:42549/solr/collection2_shard1_replica1/] and I am
[http://127.0.0.1:36014/solr/collection2_shard1_replica2/]
11171 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
[n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery
command to [http://127.0.0.1:42549/solr]; [WaitForState:
action=PREPRECOVERY&core=collection2_shard1_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node1&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
11174 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node1, state:
recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
11175 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see
collection2_shard1_replica1 (shard1 of collection2) have state: recovering
11175 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection2,
shard=shard1, thisCore=collection2_shard1_replica1,
leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
currentState=down, localState=active, nodeName=127.0.0.1:36014_solr,
coreNodeName=core_node1, onlyIfActiveCheckResult=false, nodeProps:
core_node1:{"core":"collection2_shard1_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"down"}
11278 INFO (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr)
[n:127.0.0.1:36014_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
[WatchedEvent state:SyncConnected type:NodeDataChanged
path:/collections/collection2/state.json] for collection [collection2] has
occurred - updating... (live nodes size: [2])
11278 INFO (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr)
[n:127.0.0.1:42549_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
[WatchedEvent state:SyncConnected type:NodeDataChanged
path:/collections/collection2/state.json] for collection [collection2] has
occurred - updating... (live nodes size: [2])
11323 ERROR
(updateExecutor-7-thread-1-processing-http:////127.0.0.1:36014//solr//collection2_shard1_replica2
x:collection2_shard1_replica1 r:core_node2 n:127.0.0.1:42549_solr s:shard1
c:collection2) [n:127.0.0.1:42549_solr c:collection2 s:shard1 r:core_node2
x:collection2_shard1_replica1] o.a.s.u.StreamingSolrClients error
java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at
org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
at
org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
at
org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
at
org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205)
at
org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:159)
at
org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
at
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at
org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:61)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:323)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:186)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
was:
When a node is recovering, the leader can meet an exception when trying to send
an update to the buffering node. So the leader will try running LIR process:
first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time
to see the node's state is RECOVERING.
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
.createCollection(collection, "config", 1, 2)
.setMaxShardsPerNode(1)
.process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
cluster.getSolrClient().getZkStateReader(),
false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();
DocCollection docCollection =
cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica !=
slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);
new UpdateRequest()
.add(sdoc("id", "1"))
.process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
.add(sdoc("id", "2"))
.process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
.add(sdoc("id", "3"))
.process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
cluster.getSolrClient().getZkStateReader(),
false, true, 60);
CollectionAdminRequest
.deleteCollection(collection)
.process(cloudClient);
{code}
> LIR should check the node is recovering before bring it down
> ------------------------------------------------------------
>
> Key: SOLR-9945
> URL: https://issues.apache.org/jira/browse/SOLR-9945
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Cao Manh Dat
>
> When a node is recovering, the leader can meet an exception when trying to
> send an update to the buffering node. So the leader will try running LIR
> process: first set the node's state to DOWN, then send recovery OP to the
> node.
> In the same time, PrepRecoveryOp will make the leader wait for a very long
> time to see the node's state is RECOVERING.
> This scenario can easily be achieved by using this test
> {code}
> String collection = "collection2";
> CollectionAdminRequest
> .createCollection(collection, "config", 1, 2)
> .setMaxShardsPerNode(1)
> .process(cluster.getSolrClient());
> AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
> cluster.getSolrClient().getZkStateReader(),
> false, true, 30);
> CloudSolrClient cloudClient = cluster.getSolrClient();
> DocCollection docCollection =
> cloudClient.getZkStateReader().getClusterState().getCollection(collection);
> Slice slice = docCollection.getSlice("shard1");
> Replica replicaNode = slice.getReplicas(replica -> replica !=
> slice.getLeader()).get(0);
> JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);
> new UpdateRequest()
> .add(sdoc("id", "1"))
> .process(cloudClient, collection);
> ChaosMonkey.stop(replicaRunner);
> new UpdateRequest()
> .add(sdoc("id", "2"))
> .process(cloudClient, collection);
> ChaosMonkey.start(replicaRunner);
> new UpdateRequest()
> .add(sdoc("id", "3"))
> .process(cloudClient, collection);
> AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection,
> cluster.getSolrClient().getZkStateReader(),
> false, true, 60);
> CollectionAdminRequest
> .deleteCollection(collection)
> .process(cloudClient);
> {code}
> Here are some log
> {code}
> 10942 INFO (searcherExecutor-53-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.SolrCore [collection1_shard2_replica2]
> Registered new searcher Searcher@510d1ea5[collection1_shard2_replica2]
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 10942 INFO (searcherExecutor-56-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node3)
> [n:127.0.0.1:36014_solr c:collection1 s:shard1 r:core_node3
> x:collection1_shard1_replica2] o.a.s.c.SolrCore [collection1_shard1_replica2]
> Registered new searcher Searcher@505d7839[collection1_shard1_replica2]
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 10958 INFO (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.ZkController Core needs to
> recover:collection1_shard2_replica2
> 10964 INFO (updateExecutor-20-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
> 10970 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Starting recovery
> process. recoveringAfterStartup=true
> 10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy ######
> startupVersions=[[-1556034709770731520]]
> 10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Begin buffering
> updates. core=[collection1_shard2_replica2]
> 10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.u.UpdateLog Starting to buffer updates.
> FSUpdateLog{state=ACTIVE, tlog=null}
> 10971 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Publishing state of
> core [collection1_shard2_replica2] as recovering, leader is
> [http://127.0.0.1:42549/solr/collection1_shard2_replica1/] and I am
> [http://127.0.0.1:36014/solr/collection1_shard2_replica2/]
> 11129 INFO (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4)
> [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4
> x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery
> command to [http://127.0.0.1:42549/solr]; [WaitForState:
> action=PREPRECOVERY&core=collection1_shard2_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node4&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
> 11131 INFO (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr)
> [n:127.0.0.1:42549_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
> [WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/collection1/state.json] for collection [collection1] has
> occurred - updating... (live nodes size: [2])
> 11131 INFO (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
> [WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/collection1/state.json] for collection [collection1] has
> occurred - updating... (live nodes size: [2])
> 11133 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node4, state:
> recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
> 11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see
> collection1_shard2_replica1 (shard2 of collection1) have state: recovering
> 11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection1,
> shard=shard2, thisCore=collection1_shard2_replica1,
> leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
> currentState=recovering, localState=active, nodeName=127.0.0.1:36014_solr,
> coreNodeName=core_node4, onlyIfActiveCheckResult=false, nodeProps:
> core_node4:{"core":"collection1_shard2_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"recovering"}
> 11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp Waited coreNodeName: core_node4, state: recovering,
> checkLive: true, onlyIfLeader: true for: 0 seconds.
> 11140 INFO (qtp344759431-38) [n:127.0.0.1:42549_solr ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={nodeName=127.0.0.1:36014_solr&onlyIfLeaderActive=true&core=collection1_shard2_replica1&coreNodeName=core_node4&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2}
> status=0 QTime=7
> 11160 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening
> [Searcher@28717b92[collection2_shard1_replica2] main]
> 11161 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Configured
> ZooKeeperStorageIO with znodeBase: /configs/config
> 11162 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Loaded null at
> path _rest_managed.json using ZooKeeperStorageIO:path=/configs/config
> 11162 INFO (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.h.ReplicationHandler Commits will be
> reserved for 10000
> 11162 INFO (searcherExecutor-54-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.SolrCore [collection2_shard1_replica2]
> Registered new searcher Searcher@28717b92[collection2_shard1_replica2]
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C1)))}
> 11168 INFO (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.ZkController Core needs to
> recover:collection2_shard1_replica2
> 11168 INFO (updateExecutor-20-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
> 11168 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Starting recovery
> process. recoveringAfterStartup=true
> 11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy ######
> startupVersions=[[1556034713508904960]]
> 11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Begin buffering
> updates. core=[collection2_shard1_replica2]
> 11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.u.UpdateLog Starting to buffer updates.
> FSUpdateLog{state=ACTIVE, tlog=null}
> 11169 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Publishing state of
> core [collection2_shard1_replica2] as recovering, leader is
> [http://127.0.0.1:42549/solr/collection2_shard1_replica1/] and I am
> [http://127.0.0.1:36014/solr/collection2_shard1_replica2/]
> 11171 INFO (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr
> x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1)
> [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1
> x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery
> command to [http://127.0.0.1:42549/solr]; [WaitForState:
> action=PREPRECOVERY&core=collection2_shard1_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node1&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
> 11174 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node1, state:
> recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
> 11175 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see
> collection2_shard1_replica1 (shard1 of collection2) have state: recovering
> 11175 INFO (qtp344759431-37) [n:127.0.0.1:42549_solr ]
> o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection2,
> shard=shard1, thisCore=collection2_shard1_replica1,
> leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
> currentState=down, localState=active, nodeName=127.0.0.1:36014_solr,
> coreNodeName=core_node1, onlyIfActiveCheckResult=false, nodeProps:
> core_node1:{"core":"collection2_shard1_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"down"}
> 11278 INFO (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr)
> [n:127.0.0.1:36014_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
> [WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/collection2/state.json] for collection [collection2] has
> occurred - updating... (live nodes size: [2])
> 11278 INFO (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr)
> [n:127.0.0.1:42549_solr ] o.a.s.c.c.ZkStateReader A cluster state change:
> [WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/collection2/state.json] for collection [collection2] has
> occurred - updating... (live nodes size: [2])
> 11323 ERROR
> (updateExecutor-7-thread-1-processing-http:////127.0.0.1:36014//solr//collection2_shard1_replica2
> x:collection2_shard1_replica1 r:core_node2 n:127.0.0.1:42549_solr s:shard1
> c:collection2) [n:127.0.0.1:42549_solr c:collection2 s:shard1 r:core_node2
> x:collection2_shard1_replica1] o.a.s.u.StreamingSolrClients error
> java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at
> org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
> at
> org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
> at
> org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
> at
> org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205)
> at
> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:159)
> at
> org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
> at
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:61)
> at
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
> at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
> at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:323)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:186)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]