[ 
https://issues.apache.org/jira/browse/GEODE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362201#comment-17362201
 ] 

Eric Shu commented on GEODE-9248:
---------------------------------

The last commit can cause the bucket data inconsistency.

Let's say server1 hosts bucket_1 and bucket_2 (both primary), server2 hosts 
bucket_1 replicate (but also has client register interest/CQ for bucket_2), 
server_3 hosts bucket_2 replicate (and has subscription needs for bucket_1), 
server4 only has subscription needs for bucket_2.

Tx has 2 operations on bucket_1 and bucket_2, then commits.

In geode, tx calculated that server2 and server3 need to send 2 region commits 
(for bucket_1 and bucket_2). and server4 only need to send a TXCommitMessage 
containing bucket_2.

The above check in could cause 2 messages sent to server2, and only second one 
(the tx commit message supposedly only sent to server4) is being processed - so 
causing the server_2 redundant copy missing event.

Code that cause the first TXCommitMessage (correct) being replaced.
  public void add(TXCommitMessage msg) {
    synchronized (this.txInProgress) {
      final Object key = msg.getTrackerKey();
      if (key == null) {
        Assert.assertTrue(false, "TXFarSideCMTracker must have a non-null key 
for message " + msg);
      }
      this.txInProgress.put(key, msg);
      this.txInProgress.notifyAll();
    }
  }

The reason two messages were sent to server2 is that (the second tx commit 
message (with only one region commit) are sent to all adjunct servers now).
                recipients.removeAll(this.notificationOnlyMembers);
                setRecipientsSendData(recipients, processor, rcl);

                this.txState.setTailKeyOnEntries(-1L);
                setRecipientsSendData(notificationOnlyMembers, processor, rcl);

> Server hosting cq subscription queue uneccessary fills bucketToTempQueueMap 
> while in multi site split brain
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9248
>                 URL: https://issues.apache.org/jira/browse/GEODE-9248
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> The problem reproduces when you use transactions and have more servers than 
> redundant copies of the partition region, and also events are queued in 
> parallel gateway-senders due to ongoing multi-site split brain. In this case 
> all members send events to the member with subscription queue, which then 
> fills variable *bucketToTempQueueMap* with traffic intended for the buckets 
> that it doesn't host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to