[
https://issues.apache.org/jira/browse/GEODE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236486#comment-17236486
]
Dan Smith commented on GEODE-8739:
----------------------------------
I think I see the problem. If we fall into *findCoordinatorFromView*, the
locator chooses *itself* as the best coordinator.
{code}
if (localAddress.preferredForCoordinator()) {
// it's possible that all other potential coordinators are gone
// and this new member must become the coordinator
bestGuessCoordinator = localAddress;
}
{code}
Even though it got a response from the other locator, because it already tried
it once and it was not the coordinator at the time, it ignores that response
{noformat}
if (!localAddress.equals(suggestedCoordinator)
&& !state.alreadyTried.contains(suggestedCoordinator)) {
{noformat}
The regular findCoordinator logic doesn't seem to do this, it's just in
findCoordinatorFromView. It looks like we only get into findCoordinatorFromView
if we recovered a view from a .dat file.
> Split brain when locators exhaust join attempts on non existant servers
> -----------------------------------------------------------------------
>
> Key: GEODE-8739
> URL: https://issues.apache.org/jira/browse/GEODE-8739
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Jason Huynh
> Priority: Major
> Attachments: exportedLogs_locator-0.zip, exportedLogs_locator-1.zip
>
>
> The hypothesis: "if there is a locator view .dat file with several
> non-existent servers then then locators will waste all of their join attempts
> on the servers instead of finding each other"
> Scenario is a test/user attempts to recreate a cluster with existing .dat and
> persistent files. The locators are spun in parallel and from the analysis,
> it looks like they are able to communicate with each other, but then end up
> forming their own ds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)