[ 
https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492931#comment-17492931
 ] 

Chris M. Hostetter commented on SOLR-16013:
-------------------------------------------

I started digging into this because Joel and i noticed a weird situation that 
would pop up occasionally when {{ADDREPLICA}} commands would be sent to a 
cluster while nodes were shutting down (or restarting .. this is in 
kubernetes).  Sometimes N {{ADDREPLICA}} commands would become N+1 {{CREATE}} 
core commands, and we traced the logs down to the overseer logging that it's 
adding a replica around the same time that it starts shutting down, then a new 
node becomes the overseer and also says it's adding a replica before the 
original overseer has logged that it's finished.

----

It seems pretty straight forward to me that {{ZkController}} should wait for 
{{IOUtils.closeQuietly(overseer)}} to complete, before calling 
{{IOUtils.closeQuietly(overseerElector.getContext())}} ... does anyone have any 
idea why this _isn't_ the case?

> Overseer gives up election node before closing - inflight commands can be 
> processed twice
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-16013
>                 URL: https://issues.apache.org/jira/browse/SOLR-16013
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> {{ZkController}} shutdown currently has these two lines (in this order)...
> {code:java}
>     customThreadPool.submit(() -> 
> IOUtils.closeQuietly(overseerElector.getContext()));
>     customThreadPool.submit(() -> IOUtils.closeQuietly(overseer));
> {code}
> AFAICT this means that means that the overseer nodeX will give up it's 
> election node (via overseerElector) allowing some other nodeY to be elected a 
> new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, 
> which waits for the {{OverseerThread}} to finish processing any tasks in 
> process.
> In practice, this seems to make it possible for a single command in the 
> overseer queue to get processed twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to