[ 
https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524636#comment-17524636
 ] 

Donal Evans commented on GEODE-9880:
------------------------------------

Some preliminary findings and questions following investigation of this issue 
and talking with [~burcham], who knows membership code better than Patrick or 
me:

On the client, if we have a locator with only an IP address defined and the 
same locator is returned in the locator response with only a hostname defined, 
then it is not possible to detect the duplicate without either a forward or 
reverse lookup using DNS. Because of this, there is no way to prevent the 
hostname-only locator from being added to the list of locators on the client 
and then being used and causing the NPE first described.

If hostname-for-clients is configured and set to be an IP address, we follow 
the code path shown in [the stack trace in this 
comment|https://issues.apache.org/jira/browse/GEODE-9880?focusedCommentId=17460501&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17460501].
 SNIHostName requires a valid domain name to be passed into the constructor in 
SocketCreator. We attempt to resolve IP addresses to hostnames prior to 
invoking the SNIHostName constructor, but if we can't, then we use the IP 
address as a hostname. For IPv4, this succeeds, because the format of an IPv4 
address is the same as the format of a valid domain name (characters separated 
by periods), and so we're able to create the SNIHostName and set it (even 
though we may not be using SNI). For IPv6, the constructor will throw, as seen 
in the above stack trace.

>From 1.14 onward, the code in both these areas has been reworked 
>significantly, so it appears that the originally described NPE may not be 
>possible, although the client may still be unable to contact the locator or 
>hit an exception elsewhere.

Questions:

Should we make the SNIHostName use conditional on whether you're actually using 
SNI? This might allow the hostname-for-clients workaround to work for IPv6 
environments, but might not solve the problem if the user wanted to use SNI 
*and* could not resolve hostnames to IP addresses or vice versa on the client.

Should working name resolution be required in all cases? Is it a valid 
configuration of Geode to allow clients to connect to a cluster without being 
able to access the DNS used by members of the cluster?

> Cluster with multiple locators in an environment with no host name 
> resolution, leads to null pointer exception
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9880
>                 URL: https://issues.apache.org/jira/browse/GEODE-9880
>             Project: Geode
>          Issue Type: Bug
>          Components: locator, membership
>    Affects Versions: 1.12.5
>            Reporter: Tigran Ghahramanyan
>            Assignee: Patrick Johnsn
>            Priority: Major
>              Labels: blocks-1.12.10, blocks-1.15.0, membership, 
> pull-request-available
>
> In our use case we have two locators that are initially configured with IP 
> addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on 
> adding their corresponding host names to the locators list, while these host 
> names are not resolvable.
> Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client 
> tries to use such non resolvable host name to connect to a locator it tries 
> to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in 
> {_}SocketCreator.connect(){_}. Which seems strange.
> Then, if there is no locator running on the same host, the next locator in 
> the list is contacted, until reaching a locator contact configured with IP 
> address - which succeeds eventually.
> But, when there happens to be a locator listening on the same host, then we 
> have a null pointer exception in the second line below, because _inetadd=null_
> _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, 
> connects to a locator listening on the same host_
> _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd 
> = null_
>  
> As a result, the cluster comes to a failed state, unable to recover.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to