Pierre Salagnac created SOLR-17107: -------------------------------------- Summary: Leader election is unpredictable if two threads join concurrently election of the same replica Key: SOLR-17107 URL: https://issues.apache.org/jira/browse/SOLR-17107 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 9.3, 8.11 Reporter: Pierre Salagnac
There is a race condition in leader election if two thread concurrently run the election for the same replica. This is not about how leader election is distributed across multiple Solr nodes, but how multiple threads in a single Solr node conflict with each other. On the overall, when two threads (on the same server) concurrently join leader election for the same replica, the outcome is unpredictable. It may end in two nodes thinking they are the leader or not having any leader at all. h2. How to reproduce I identified two scenarios, but maybe there are more: *1. Zookeeper session expires while an election is already in progress.* When we re-create the Zookeeper session, we re-register all the cores, and join elections for all of them. If an election is already in-progress or is triggered for any reason, we can have two threads on the same Solr server node running leader election for the same core. *2. Command REJOINLEADERELECTION is received twice concurrently for the same core.* This scenario is much easier to reproduce with an external client. It occurs for us since we have customizations using this command. h2. Full analysis There are at least two issues in the current code. *1. We blindly delete ZK nodes that were created by other threads* Right after we created our ephemeral sequential ZK node to join the election queue, we check whether there are other ZK nodes for the same session ID (so the same Solr server). When some other nodes are found, we just deleted them but we don't stop the election for any of the thread. It is likely the two threads will think they won the election. In addition, if two threads join the election concurrently, it is possible they both delete the sequential node of the other thread. At the end, no node remain in the queue. So if another node joins the election later, it will miss that there may be already a leader. The fix for this issue would be to have one of the two threads that aborts the election, without deleting the node of the other thread. The election process should be continued only by the thread with the smallest sequence number in the queue. *2. Mutability around {{LeaderElector}} and contexts* Another issue is any thread can change the context of {{LeaderElector}} instances. This can be done either by invoking {{setup()}} (mostly after ZK session expiration) or {{{}retryElection(){}}}. When we change the context, the old one is closed, by we don't take into account what is the exact state of the election if another thread is currently joining with the old context. Not sure exactly what would be the fix for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org