[ 
https://issues.apache.org/jira/browse/SOLR-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995755#comment-14995755
 ] 

Ishan Chattopadhyaya edited comment on SOLR-8249 at 11/8/15 7:23 PM:
---------------------------------------------------------------------

I thought the last patch (which tested for stats on LEADER operation instead of 
STATE) would fix the failures. However, this is still failing at Steve's 
jenkins: http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3627/

Here's the relevant log from that test:
{noformat}
   [junit4]   2> 200957 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerAutoReplicaFailoverThread Starting 
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000 
autoReplicaFailoverWaitAfterExpiration=30000 
autoReplicaFailoverBadNodeExpiration=60000
   [junit4]   2> 200957 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 0)
   [junit4]   2> 200971 INFO  
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000000)
 [    ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer 
operations
   [junit4]   2> 201010 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer Starting to work on the main queue
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"recovering",
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core":"core1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "numShards":"1",
   [junit4]   2>   "base_url":"http://node1/solr/"} current state version: 0
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=1 message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"recovering",
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core":"core1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "numShards":"1",
   [junit4]   2>   "base_url":"http://node1/solr/"}
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ClusterStateMutator building a new cName: collection1
   [junit4]   2> 201013 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Assigning new node to shard shard=shard1
   [junit4]   2> 201013 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201013 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201481 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: 
/collections/collection1/leader_elect/shard1/election
   [junit4]   2> 201481 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/collections, has occurred - updating...
   [junit4]   2> 201481 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/collections, has occurred - updating...
   [junit4]   2> 201483 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.LeaderElector Joined leadership election with path: 
/collections/collection1/leader_elect/shard1/election/94830728503164931-node1_core1-n_0000000000
   [junit4]   2> 201483 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/leaders/shard1
   [junit4]   2> 201484 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node
   [junit4]   2> 201486 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"} current state version: 1
   [junit4]   2> 201487 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201488 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"leader",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "core":"core1"} current state version: 1
   [junit4]   2> 201489 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.Overseer Overseer 
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000000) closing
   [junit4]   2> 201489 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.ElectionContext Canceling election 
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000000
   [junit4]   2> 201490 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:50527_solr
   [junit4]   2> 201491 INFO  (OverseerExitThread) [    ] o.a.s.c.Overseer I'm 
exiting , but I'm still the leader
   [junit4]   2> 201491 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.LeaderElector Joined leadership election with path: 
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000001
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:50527_solr
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: /overseer_elect/leader
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.Overseer Overseer 
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000001) starting
   [junit4]   2> 201494 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerAutoReplicaFailoverThread Starting 
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000 
autoReplicaFailoverWaitAfterExpiration=30000 
autoReplicaFailoverBadNodeExpiration=60000
   [junit4]   2> 201538 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer Starting to work on the main queue
   [junit4]   2> 201541 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201541 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201541 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201541 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201542 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
   [junit4]   2>   "operation":"leader",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201542 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201542 INFO  
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000001)
 [    ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer 
operations
   [junit4]   2> 201543 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201551 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 1)
   [junit4]   2> 201551 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 1)
{noformat}

This failed because there was a second leader election (or maybe the overseer 
queue messages from the first leader election was reprocessed after the 
overseer was shutdown) between the two asserts. I have not been able to 
reproduce it myself, but Steve's jenkins is a very fast machine.

Now, I am thinking of changing tack, and switching back to testing for STATE 
message stats. The problem was initially that because there was a RECOVERING 
STATE message passed in, the leader election sent in an extra ACTIVE message 
(owing to SOLR-7989). However, if we send in ACTIVE instead of RECOVERING, 
there won't be another STATE message sent in during leader election (even if 
there is a second leader election). This, I think will fix the test. Attaching 
the patch for this.

I'll seek Steve's help and beast it on his machine to be sure.


was (Author: ichattopadhyaya):
I thought the last patch (which tested for stats on LEADER operation instead of 
STATE) would fix the failures. However, this is still failing at Steve's 
jenkins: http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3627/

Here's the relevant log from that test:
{noformat}
   [junit4]   2> 200957 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerAutoReplicaFailoverThread Starting 
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000 
autoReplicaFailoverWaitAfterExpiration=30000 
autoReplicaFailoverBadNodeExpiration=60000
   [junit4]   2> 200957 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 0)
   [junit4]   2> 200971 INFO  
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000000)
 [    ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer 
operations
   [junit4]   2> 201010 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer Starting to work on the main queue
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"recovering",
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core":"core1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "numShards":"1",
   [junit4]   2>   "base_url":"http://node1/solr/"} current state version: 0
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=1 message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"recovering",
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core":"core1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "numShards":"1",
   [junit4]   2>   "base_url":"http://node1/solr/"}
   [junit4]   2> 201012 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ClusterStateMutator building a new cName: collection1
   [junit4]   2> 201013 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Assigning new node to shard shard=shard1
   [junit4]   2> 201013 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201013 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201481 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: 
/collections/collection1/leader_elect/shard1/election
   [junit4]   2> 201481 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/collections, has occurred - updating...
   [junit4]   2> 201481 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/collections, has occurred - updating...
   [junit4]   2> 201483 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.LeaderElector Joined leadership election with path: 
/collections/collection1/leader_elect/shard1/election/94830728503164931-node1_core1-n_0000000000
   [junit4]   2> 201483 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/leaders/shard1
   [junit4]   2> 201484 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node
   [junit4]   2> 201486 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"} current state version: 1
   [junit4]   2> 201487 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201488 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
   [junit4]   2>   "operation":"leader",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "core":"core1"} current state version: 1
   [junit4]   2> 201489 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.Overseer Overseer 
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000000) closing
   [junit4]   2> 201489 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.ElectionContext Canceling election 
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000000
   [junit4]   2> 201490 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [    
] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:50527_solr
   [junit4]   2> 201491 INFO  (OverseerExitThread) [    ] o.a.s.c.Overseer I'm 
exiting , but I'm still the leader
   [junit4]   2> 201491 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.LeaderElector Joined leadership election with path: 
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000001
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:50527_solr
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.c.SolrZkClient makePath: /overseer_elect/leader
   [junit4]   2> 201492 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.Overseer Overseer 
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000001) starting
   [junit4]   2> 201494 INFO  
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [    ] 
o.a.s.c.OverseerAutoReplicaFailoverThread Starting 
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000 
autoReplicaFailoverWaitAfterExpiration=30000 
autoReplicaFailoverBadNodeExpiration=60000
   [junit4]   2> 201538 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer Starting to work on the main queue
   [junit4]   2> 201541 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201541 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
   [junit4]   2>   "operation":"state",
   [junit4]   2>   "state":"active",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "node_name":"node1",
   [junit4]   2>   "core_node_name":"core_node1",
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201541 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201541 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201542 INFO  
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [    
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
   [junit4]   2>   "operation":"leader",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "collection":"collection1",
   [junit4]   2>   "base_url":"http://node1/solr/";,
   [junit4]   2>   "core":"core1"}
   [junit4]   2> 201542 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201542 INFO  
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000001)
 [    ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer 
operations
   [junit4]   2> 201543 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 1)
   [junit4]   2> 201551 INFO  (zkCallback-278-thread-1) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 1)
   [junit4]   2> 201551 INFO  (zkCallback-279-thread-2) [    ] 
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live 
nodes size: 1)
{noformat}

This failed because there was a second leader election between the two asserts. 
I have not been able to reproduce it myself, but Steve's jenkins is a very fast 
machine.

Now, I am thinking of changing tack, and switching back to testing for STATE 
message stats. The problem was initially that because there was a RECOVERING 
STATE message passed in, the leader election sent in an extra ACTIVE message 
(owing to SOLR-7989). However, if we send in ACTIVE instead of RECOVERING, 
there won't be another STATE message sent in during leader election (even if 
there is a second leader election). This, I think will fix the test. Attaching 
the patch for this.

> OverseerTest failures
> ---------------------
>
>                 Key: SOLR-8249
>                 URL: https://issues.apache.org/jira/browse/SOLR-8249
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>         Attachments: SOLR-8249.patch, SOLR-8249.patch, SOLR-8249.patch, 
> SOLR-8249.patch
>
>
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/5377/
> This is related to SOLR-7989 . [~ichattopadhyaya] we need to fix the testcase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to