Kamilla Aslami created GEODE-9350:
-------------------------------------
Summary: ShunnedMemberException after MemberJoinedEvent is
triggered
Key: GEODE-9350
URL: https://issues.apache.org/jira/browse/GEODE-9350
Project: Geode
Issue Type: Bug
Components: membership
Affects Versions: 1.14.0, 1.15.0
Reporter: Kamilla Aslami
While investigating GEODE-9070, we noticed a problem when a server tries to
join a cluster, and soon after, membership fails with ShunnedMemberException:
{noformat}
org.apache.geode.distributed.internal.direct.ShunnedMemberException: Member is
being shunned: ccf730fb2b62(161)<v2>:41002
at
org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:469)
at
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:283)
at
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:190)
at
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:550)
at
org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:354)
at
org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:296)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2068)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1983)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2028)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1085)
at
org.apache.geode.internal.cache.execute.StreamingFunctionOperation.getFunctionResultFrom(StreamingFunctionOperation.java:113)
at
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:149)
at
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
at
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:397)
at
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:402)
at
org.apache.geode.modules.util.BootstrappingFunction.bootstrapMember(BootstrappingFunction.java:170)
at
org.apache.geode.modules.util.BootstrappingFunction.memberJoined(BootstrappingFunction.java:240)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberJoinedEvent.handleEvent(ClusterDistributionManager.java:2498)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2451)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2440)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1406)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:109)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1438)
at java.base/java.lang.Thread.run(Thread.java:834){noformat}
Further analysis showed that ShunnedMemberException is thrown because
GMSMembership.memberExists() method returns false, which means that the member
ccf730fb2b62(161)<v2>:41002 was not in the view. Looking at the stacktrace, we
noticed that BootstrappingFunction.bootstrapMember() gets executed on
MemberJoinedEvent, which is triggered by
MembershipListener.newMemberConnected(). newMemberConnected() is called in
GMSMembership.processView() before the new view is installed, so it's likely
that the failure happens because BootstrappingFunction receives the event
before the view was actually updated. Possible solution for this problem could
be to change GMSMembership.processView() to call
MembershipListener.newMemberConnected() only after the new view is installed.
This issue was introduced by the fix for GEODE-7245 which removed latestView
lock from GMSMembership.memberExists(). Before GEODE-7245, this method was
waiting until GMSMembership.processView() released the lock, so the problem
described above could never happen. GEODE-7245 was back-ported to 1.14.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)