[
https://issues.apache.org/jira/browse/GEODE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362201#comment-17362201
]
Eric Shu commented on GEODE-9248:
---------------------------------
The last commit can cause the bucket data inconsistency.
Let's say server1 hosts bucket_1 and bucket_2 (both primary), server2 hosts
bucket_1 replicate (but also has client register interest/CQ for bucket_2),
server_3 hosts bucket_2 replicate (and has subscription needs for bucket_1),
server4 only has subscription needs for bucket_2.
Tx has 2 operations on bucket_1 and bucket_2, then commits.
In geode, tx calculated that server2 and server3 need to send 2 region commits
(for bucket_1 and bucket_2). and server4 only need to send a TXCommitMessage
containing bucket_2.
The above check in could cause 2 messages sent to server2, and only second one
(the tx commit message supposedly only sent to server4) is being processed - so
causing the server_2 redundant copy missing event.
Code that cause the first TXCommitMessage (correct) being replaced.
public void add(TXCommitMessage msg) {
synchronized (this.txInProgress) {
final Object key = msg.getTrackerKey();
if (key == null) {
Assert.assertTrue(false, "TXFarSideCMTracker must have a non-null key
for message " + msg);
}
this.txInProgress.put(key, msg);
this.txInProgress.notifyAll();
}
}
The reason two messages were sent to server2 is that (the second tx commit
message (with only one region commit) are sent to all adjunct servers now).
recipients.removeAll(this.notificationOnlyMembers);
setRecipientsSendData(recipients, processor, rcl);
this.txState.setTailKeyOnEntries(-1L);
setRecipientsSendData(notificationOnlyMembers, processor, rcl);
> Server hosting cq subscription queue uneccessary fills bucketToTempQueueMap
> while in multi site split brain
> -----------------------------------------------------------------------------------------------------------
>
> Key: GEODE-9248
> URL: https://issues.apache.org/jira/browse/GEODE-9248
> Project: Geode
> Issue Type: Bug
> Reporter: Jakov Varenina
> Assignee: Jakov Varenina
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> The problem reproduces when you use transactions and have more servers than
> redundant copies of the partition region, and also events are queued in
> parallel gateway-senders due to ongoing multi-site split brain. In this case
> all members send events to the member with subscription queue, which then
> fills variable *bucketToTempQueueMap* with traffic intended for the buckets
> that it doesn't host.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)