[
https://issues.apache.org/jira/browse/GEODE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakov Varenina updated GEODE-9025:
----------------------------------
Description:
When running Apache geode in Kubernetes, then in some cases ClassCastException
is thrown when locator discovery is performed. This exception occurs when
locator try to cast received Object to RemoteLocatorJoinResponse object. The
problem is that locator discovery thread is then stopped, and due to that,
locator discovery will never be successfully performed. The only way to trigger
locator discovery again is to restart the locator.
*The root cause of this issues is following:*
If locator gets EOFException when sending VersionRequest message, then it
automatically assumes that remote locator is running old version of geode which
doesn't support VersionRequest message. Locator then uses the oldest known
version and sends RemoteLocatorJoinRequest towards the remote locator. Then
locator tries to read the response as follows:
{code:java}
public Object requestToServer(HostAndPort addr, Object request, int timeout,
boolean replyExpected) throws IOException, ClassNotFoundException {
...
Object response = objectDeserializer.readObject(versionedDataInputStream);
logger.debug("received response: {}", response);
return response;
}
{code}
Because locator reads response as Object it will The function
requestToServer() will then return object which will cause ClassCastException,
since it is not of type LocatorRequestJoinResponse:
{code:java}
public void exchangeRemoteLocators() {
...
RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse)
locatorClient
.requestToServer(locatorId.getHost(), request,
WAN_LOCATOR_CONNECTION_TIMEOUT, true);
{code}
*The solution:*
I think that locator must not assume version in this case, but to throw
EOFException for which locator discover thread will retry after 10 seconds.
was:
When running Apache geode in Kubernetes, then in some cases ClassCastException
is thrown when locator discovery is performed. This exception occurs when
locator try to cast received Object to RemoteLocatorJoinResponse object. The
problem is that locator discovery thread is then stopped, and due to that,
locator discovery will never be successfully performed. The only way to trigger
locator discovery again is to restart the locator.
*The root cause of this issues is following:*
If locator gets EOFException when sending VersionRequest message, then it
automatically assumes that remote locator is running old version of geode which
doesn't support VersionRequest. Locator then uses the oldest known version and
sends RemoteLocatorJoinRequest towards the remote locator. Then locator tries
to read the response as follows:
{code:java}
public Object requestToServer(HostAndPort addr, Object request, int timeout,
boolean replyExpected) throws IOException, ClassNotFoundException {
...
Object response = objectDeserializer.readObject(versionedDataInputStream);
logger.debug("received response: {}", response);
return response;
}
{code}
Because locator reads response as Object, that means, it will read any type of
object. The function requestToServer() will then return object which will cause
ClassCastException, since it is not of type LocatorRequestJoinResponse:
{code:java}
public void exchangeRemoteLocators() {
...
RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse)
locatorClient
.requestToServer(locatorId.getHost(), request,
WAN_LOCATOR_CONNECTION_TIMEOUT, true);
{code}
*The solution:*
> ClassCastException occures during remote locator discovery
> ----------------------------------------------------------
>
> Key: GEODE-9025
> URL: https://issues.apache.org/jira/browse/GEODE-9025
> Project: Geode
> Issue Type: Bug
> Reporter: Jakov Varenina
> Assignee: Jakov Varenina
> Priority: Major
>
> When running Apache geode in Kubernetes, then in some cases
> ClassCastException is thrown when locator discovery is performed. This
> exception occurs when locator try to cast received Object to
> RemoteLocatorJoinResponse object. The problem is that locator discovery
> thread is then stopped, and due to that, locator discovery will never be
> successfully performed. The only way to trigger locator discovery again is to
> restart the locator.
> *The root cause of this issues is following:*
> If locator gets EOFException when sending VersionRequest message, then it
> automatically assumes that remote locator is running old version of geode
> which doesn't support VersionRequest message. Locator then uses the oldest
> known version and sends RemoteLocatorJoinRequest towards the remote locator.
> Then locator tries to read the response as follows:
> {code:java}
> public Object requestToServer(HostAndPort addr, Object request, int timeout,
> boolean replyExpected) throws IOException, ClassNotFoundException {
> ...
> Object response =
> objectDeserializer.readObject(versionedDataInputStream);
> logger.debug("received response: {}", response);
> return response;
> }
> {code}
> Because locator reads response as Object it will The function
> requestToServer() will then return object which will cause
> ClassCastException, since it is not of type LocatorRequestJoinResponse:
> {code:java}
> public void exchangeRemoteLocators() {
> ...
> RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse)
> locatorClient
> .requestToServer(locatorId.getHost(), request,
> WAN_LOCATOR_CONNECTION_TIMEOUT, true);
> {code}
>
> *The solution:*
> I think that locator must not assume version in this case, but to throw
> EOFException for which locator discover thread will retry after 10 seconds.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)