Eric Shu created GEODE-8465:
-------------------------------
Summary: A stray event can resurrect on a client due to a race
condition
Key: GEODE-8465
URL: https://issues.apache.org/jira/browse/GEODE-8465
Project: Geode
Issue Type: Bug
Components: client queues
Reporter: Eric Shu
A member with primary queue could send the QRM for the event before it removes
the event from its HARegionQueue – this causes a race that a newly started
member can gii and get the events from the member with primary queue after it
has already sent the QRM for the event to existing members holding the
secondary queue.
The following analysis showing that the race exists.
{noformat}
The stray create of Object_5234on edge 10634 shows this issue with this cause.
>From edge 10634, we know the stray create is for v3 of the object and from
>bridge 15141.
[fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client
Updater Thread on
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
port 26905> tid=0x541] VersionedThinRegionEntryHeapStringKey2@1b56c386
(key=Object_5234; rawValue=VMCachedDeserializable@526996532; version=
{v3; rv28;
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
ds=2; time=1592434556093}
;member=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007)
dispatching event
EntryEventImpl[op=CREATE;region=/DefaultRegion;key=Object_5234;oldValue=null;newValue=VMCachedDeserializable@526996532;callbackArg=Update
event originated in pid
10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;version=
{v3; rv28;
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750)<ec><v27>:41007;
ds=2; time=1592434556093; remote}
;id=EventIDid=52bytes;threadID=52;sequenceID=108];isFromServer]
[fine 2020/06/17 16:38:49.621 PDT edgegemfire_2_1_host1_10634 <Cache Client
Updater Thread on
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_15141:15141)<ec><v35>:41007(version:UNKNOWN[ordinal=125])
port 26905> tid=0x541] Put entry for region: /DefaultRegion key: Object_5234
callbackArgument: Update event originated in pid 10655
***This version (v3) is an update. The primary queue at the time is
bridgegemfire_2_2_host1_10538. The event is enqueued and dispatched ***
[fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message
reader for
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
unshared ordered uid=358 dom #1 port=43914> tid=0x16d] DefaultRegion:
notifying 1 bridge servers of event:
EntryEventImpl[op=UPDATE;region=/DefaultRegion;key=Object_5234;oldValue=VMCachedDeserializable@1538889434;newValue=VMCachedDeserializable@2141181294;callbackArg=Update
event originated in pid
10655;originRemote=true;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);version=
{v3; rv28;
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
ds=2; time=1592434556093; remote}
;context=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;id=EventIDid=52bytes;threadID=52;sequenceID=108];routing=]
[fine 2020/06/17 15:55:56.098 PDT bridgegemfire_2_2_host1_10538 <P2P message
reader for
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])
unshared ordered uid=358 dom #1 port=43914> tid=0x16d]
VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867;
rawValue=HAEventWrapper[refCount=2;
msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
event originated in pid
10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=
{v3; rv28;
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
ds=2; time=1592434556093; remote}
;hasCqs=false]]) dispatching event
EntryEventImpl[op=CREATE;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=null;newValue=HAEventWrapper[refCount=2;
msg=ClientUpdateMessageImpl[op=AFTER_UPDATE;region=/DefaultRegion;key=Object_5234;isObject=1;cbArg=Update
event originated in pid
10655;memberId=identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655,connection=1;eventId=EventIDid=52bytes;threadID=52;sequenceID=108];shouldConflate=false;versionTag=
{v3; rv28;
mbr=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125]);
ds=2; time=1592434556093; remote}
;hasCqs=false]];callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]
***QRM is sent to the 2 peers at the time with the region ***
[fine 2020/06/17 15:55:56.213 PDT bridgegemfire_2_2_host1_10538 <Queue Removal
Thread> tid=0x83] Sending
(QueueRemovalMessage[_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue,
1, EventIDid=52bytes;threadID=52;sequenceID=108],
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_3_host1_10666:10666:loner):60462:26db73c4:edgegemfire_2_3_host1_10666_1_queue,
1, EventIDid=52bytes;threadID=52;sequenceID=108]]) to 2 peers
([rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_1_host1_18750:18750)<ec><v27>:41007(version:UNKNOWN[ordinal=125])@277,
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_4_host1_23574:23574)<ec><v31>:41011(version:UNKNOWN[ordinal=125])@566])
via tcp/ip
***the newly started bridgegemfire_2_3_24320 creates the HARegionQueue and gii
from the member with primary queue (10538) ***
[info 2020/06/17 15:56:08.800 PDT <Client Queue Initialization Thread 1>
tid=0x76] Initializing region
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
[info 2020/06/17 15:56:09.285 PDT <Client Queue Initialization Thread 1>
tid=0x76]
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
is done getting image from
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006(version:GEODE
1.3.0). isDeltaGII is false
[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
tid=0x76] adding haContainerKey to HARegion at
1:HAEventWrapper[region=/DefaultRegion; key=Object_5234; refCount=1;
putInProgress=0;
event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
no message] for
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
tid=0x76] invoking listeners:
[org.apache.geode.internal.cache.ha.HARegionQueue$1@2e69472b]
[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
tid=0x76] Adding position 1 to available IDs. Region:
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
[debug 2020/06/17 15:56:09.377 PDT <Client Queue Initialization Thread 1>
tid=0x76] RegionQueue on
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue(backup)
done putting GII data into queue
***Please note that the event was being removed from its primary queue after
QRM has been sent and bridge 24320 gii the events from it ***
[fine 2020/06/17 15:56:09.666 PDT bridgegemfire_2_2_host1_10538 <Client Message
Dispatcher for
rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634>
tid=0x84] VMStatsRegionEntryHeapLongKey@38ccc0e5 (key=3867;
rawValue=REMOVED_PHASE2) dispatching event
EntryEventImpl[op=LOCAL_DESTROY;region=/_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue;key=3867;oldValue=HAEventWrapper[region=/DefaultRegion;
key=Object_5234; refCount=1; inContainer=true; putInProgress=false;
event=EventIDid=52bytes;threadID=52;sequenceID=108]; no
message];newValue=null;callbackArg=null;originRemote=false;originMember=rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_2_host1_10538:10538)<ec><v2>:41006]
***Bridge 24320 now has the event for Object_5234(v3) that has been sent to the
client and won't get QRM again ***
***Later, bridge 15141 gii from 24320 and got the event ***
[info 2020/06/17 16:17:36.454 PDT <Client Queue Initialization Thread 2>
tid=0x78] Initializing region
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
[info 2020/06/17 16:17:40.187 PDT <Client Queue Initialization Thread 2>
tid=0x78]
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
is done getting image from
rs-GEM-2944-1535-hydra-client-1(bridgegemfire_2_3_host1_24320:24320)<ec><v33>:41010.
isDeltaGII is false
[debug 2020/06/17 16:17:40.415 PDT <Client Queue Initialization Thread 2>
tid=0x78]
_gfe_non_durable_client_with_id_rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634_1_queue
processing queue key 1 and value HAEventWrapper[region=/DefaultRegion;
key=Object_5234; refCount=1; putInProgress=0;
event=EventID[rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_2_host1_10655:loner):60216:b0d873c4:edgegemfire_2_2_host1_10655;threadID=52;sequenceID=108];
no message]
***It will send the event to client after it becomes the primary queue holder
for client 10634 ***
[debug 2020/06/17 16:38:49.125 PDT <ServerConnection on port 26905 Thread 5>
tid=0xb4]
CacheClientProxy[identity(rs-GEM-2944-1535-hydra-client-1(edgegemfire_2_1_host1_10634:10634:loner):51062:4bd673c4:edgegemfire_2_1_host1_10634,connection=1;
port=53006; primary=true; version=GEODE 1.3.0]: Queueing marker message.
<org.apache.geode.internal.cache.tier.sockets.ClientMarkerMessageImpl@75bf95c8>.
The queue contains 57 entries.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)