[ 
https://issues.apache.org/jira/browse/SOLR-17652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923822#comment-17923822
 ] 

Chris M. Hostetter commented on SOLR-17652:
-------------------------------------------

FYI, here's a way to demonstrate the bug using solr's example mode (commands 
below is from 9x, commands need modified slightly to work on main)...

 
{noformat}
./solr/packaging/build/dev/bin/solr start -e cloud -noprompt

# Setup our collection and both types of replicas
curl -sS 
'http://localhost:8983/solr/admin/collections?action=CREATE&name=techproducts&numShards=1&tlogReplicas=1&createNodeSet=localhost:7574_solr'
curl -sS 
'http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=techproducts&shard=shard1&node=localhost:8983_solr&type=PULL'

./solr/packaging/build/dev/bin/post -c techproducts 
./solr/packaging/build/dev/example/exampledocs/*.xml

# shut down pod hosting TLOG leader
./solr/packaging/build/dev/bin/solr stop -p 7574

# stop & re-start pod hosting PULL replica (and embedded zk)

./solr/packaging/build/dev/bin/solr stop -p 8983

./solr/packaging/build/dev/bin/solr start --cloud -p 8983 --solr-home 
"/home/hossman/lucene/solr/solr/packaging/build/dev/example/cloud/node1/solr" 
--server-dir "/home/hossman/lucene/solr/solr/packaging/build/dev/server"

# Wait ~13 min (until you see an exception like the one above in the logs)

# Bring back the TLOG leader pod...
./solr/packaging/build/dev/bin/solr start --cloud -p 7574 --solr-home 
"/home/hossman/lucene/solr/solr/packaging/build/dev/example/cloud/node2/solr" 
--server-dir "/home/hossman/lucene/solr/solr/packaging/build/dev/server" -z 
127.0.0.1:9983

# PULL replica will still stay DOWN forever

{noformat}
 

> PULL replicas can be stuck permemantly in DOWN state if leader election takes 
> too long
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-17652
>                 URL: https://issues.apache.org/jira/browse/SOLR-17652
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-17652.patch
>
>
> A bug exists in {{ZkController}} that can cause PULL replicas to be 
> permanently stuck in a DOWN state (such that even a core RELOAD can not fix 
> it) if that PULL replica was initially loaded during a leader election that 
> takes a significant amount of time.
>  
> Details to follow in comments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to