[ https://issues.apache.org/jira/browse/SOLR-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492931#comment-17492931 ]
Chris M. Hostetter commented on SOLR-16013: ------------------------------------------- I started digging into this because Joel and i noticed a weird situation that would pop up occasionally when {{ADDREPLICA}} commands would be sent to a cluster while nodes were shutting down (or restarting .. this is in kubernetes). Sometimes N {{ADDREPLICA}} commands would become N+1 {{CREATE}} core commands, and we traced the logs down to the overseer logging that it's adding a replica around the same time that it starts shutting down, then a new node becomes the overseer and also says it's adding a replica before the original overseer has logged that it's finished. ---- It seems pretty straight forward to me that {{ZkController}} should wait for {{IOUtils.closeQuietly(overseer)}} to complete, before calling {{IOUtils.closeQuietly(overseerElector.getContext())}} ... does anyone have any idea why this _isn't_ the case? > Overseer gives up election node before closing - inflight commands can be > processed twice > ----------------------------------------------------------------------------------------- > > Key: SOLR-16013 > URL: https://issues.apache.org/jira/browse/SOLR-16013 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > > {{ZkController}} shutdown currently has these two lines (in this order)... > {code:java} > customThreadPool.submit(() -> > IOUtils.closeQuietly(overseerElector.getContext())); > customThreadPool.submit(() -> IOUtils.closeQuietly(overseer)); > {code} > AFAICT this means that means that the overseer nodeX will give up it's > election node (via overseerElector) allowing some other nodeY to be elected a > new overseer, **BEFORE** Overseer nodeX shuts down it's {{Overseer}} object, > which waits for the {{OverseerThread}} to finish processing any tasks in > process. > In practice, this seems to make it possible for a single command in the > overseer queue to get processed twice. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org