Shalin Shekhar Mangar created SOLR-9504:
-------------------------------------------

             Summary: A replica with an empty index becomes the leader even 
when other more qualified replicas are in line
                 Key: SOLR-9504
                 URL: https://issues.apache.org/jira/browse/SOLR-9504
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: master (7.0)
            Reporter: Shalin Shekhar Mangar
            Priority: Critical
             Fix For: 6.3, master (7.0)


I haven't tried branch_6x or any release yet. But this is trivially 
reproducible on master with the following steps:
# Start two solr nodes
# Create a collection with 1 shard, 1 replica so that one node is empty.
# Index some documents
# Shutdown the leader node
# Use addreplica API to create a replica of the collection on the still-running 
node. For some reason this API hangs until you restart the other node (possibly 
a bug itself) but do not wait for the API to complete.
# Restart the former leader node

You'll find that the replica with 0 docs has become the leader. The former 
leader recovers from the leader without replicating any index files. It still 
has the old index which has some docs.

This is from the logs of the 0 doc replica:
{code}
713102 INFO  (zkCallback-4-thread-5-processing-n:127.0.1.1:7574_solr) [   ] 
o.a.s.c.c.ZkStateReader Updating data for [gettingstarted] from [9] to [10]
714377 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext Enough 
replicas found to continue.
714377 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext I may be 
the new leader - try and sync
714377 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.SyncStrategy Sync replicas to 
http://127.0.1.1:7574/solr/gettingstarted_shard1_replica2/
714380 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.u.PeerSync PeerSync: 
core=gettingstarted_shard1_replica2 url=http://127.0.1.1:7574/solr START 
replicas=[http://127.0.1.1:8983/solr/gettingstarted_shard1_replica1/] 
nUpdates=100
714381 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.u.PeerSync PeerSync: 
core=gettingstarted_shard1_replica2 url=http://127.0.1.1:7574/solr DONE.  We 
have no versions.  sync failed.
714382 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.SyncStrategy Leader's attempt to sync 
with shard failed, moving to the next candidate
714382 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext We failed 
sync, but we have no versions - we can't sync in that case - we were active 
before, so become leader anyway
714387 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContextBase 
Creating leader registration node 
/collections/gettingstarted/leaders/shard1/leader after winning as 
/collections/gettingstarted/leader_elect/shard1/election/96579592334475268-core_node2-n_0000000001
714398 INFO  (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 
x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext I am the 
new leader: http://127.0.1.1:7574/solr/gettingstarted_shard1_replica2/ shard1
{code}

It basically tries to sync but has no versions and because it was active before 
(it is a new core starting up for the first time), it becomes the leader and 
publishes itself as active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to