[
https://issues.apache.org/jira/browse/GEODE-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201796#comment-17201796
]
ASF GitHub Bot commented on GEODE-8522:
---------------------------------------
Bill commented on pull request #5545:
URL: https://github.com/apache/geode/pull/5545#issuecomment-698610478
@upthewaterspout so when two locators are starting up simultaneously I
suppose `locatorClient.requestToServer()` is throwing `IOException` (because
some other locator on `locators` list hasn't started yet.)
Your comments imply that starting two locators at once is "normal". [Looking
at the
docs](https://geode.apache.org/docs/guide/12/configuring/running/starting_up_shutting_down.html),
I suppose it is, provided you specify `locator-wait-time` (so as to avoid
split-brain.)
Oddly, the docs mention:
> …an info-level message
>
> GemFire startup was unable to contact a locator. Waiting for one to
start. Configured locators are frodo[12345],pippin[12345].
I don't find that message text anywhere in the Geode source today (searched
for "unable to contact a locator" and "Configured locators are"). It makes me
think that the docs might talking about (an old version) of the very message
that this PR is changing (from info level down to debug level.) The logged
message, on the `develop` branch and in this PR (unchanged) is:
> Exception thrown when contacting a locator
If that's right (that the docs are talking about the message modified by
this PR), then we need a doc change to go with this code change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Locators log full stack traces of exceptions at info level during normal
> startup
> --------------------------------------------------------------------------------
>
> Key: GEODE-8522
> URL: https://issues.apache.org/jira/browse/GEODE-8522
> Project: Geode
> Issue Type: Bug
> Reporter: Dan Smith
> Assignee: Dan Smith
> Priority: Major
> Labels: pull-request-available
>
> It's normal to configure multiple locators that all refer to each other's
> addresses. When starting up, the first locator that starts up will always log
> an exception failing to talk to other locators.
> {noformat}
> [info 2020/09/22 21:16:16.582 GMT <main> tid=0x1] Exception thrown when
> contacting a locator
> java.net.NoRouteToHostException: No route to host (Host unreachable)
> at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
> at
> java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
> at
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
> at
> java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
> at java.base/java.net.Socket.connect(Socket.java:609)
> at
> org.apache.geode.distributed.internal.tcpserver.AdvancedSocketCreatorImpl.connect(AdvancedSocketCreatorImpl.java:102)
> at
> org.apache.geode.internal.net.SCAdvancedSocketCreator.connect(SCAdvancedSocketCreator.java:51)
> at
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.connect(ClusterSocketCreatorImpl.java:96)
> at
> org.apache.geode.distributed.internal.tcpserver.TcpClient.getServerVersion(TcpClient.java:262)
> at
> org.apache.geode.distributed.internal.tcpserver.TcpClient.requestToServer(TcpClient.java:153)
> at
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.findCoordinator(GMSJoinLeave.java:1156)
> at
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.join(GMSJoinLeave.java:342)
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.join(GMSMembership.java:568)
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.access$1300(GMSMembership.java:72)
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.joinDistributedSystem(GMSMembership.java:1974)
> at
> org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:242)
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1853)
> at
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
> at
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:464)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:497)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
> at
> org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:743)
> at
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:388)
> at
> org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:716)
> at
> org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:623)
> at
> org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:217)
> {noformat}
> We shouldn't log full stack trace exceptions for something that is normal
> part of the startup process, because it makes it harder to search for errors.
> This is coming from this line in the code, which was switch from debug to
> info in the last year:
> https://github.com/apache/geode/blob/52018fcf1da513c888092775295a121992abcec2/geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java#L1200
--
This message was sent by Atlassian Jira
(v8.3.4#803005)