[jira] [Assigned] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing

Vyacheslav Koptilin (Jira) Wed, 04 Oct 2023 06:38:12 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vyacheslav Koptilin reassigned IGNITE-20484:
--------------------------------------------

    Assignee: Vladislav Pyatkov

> NPE when some operation occurs when the primary replica is changing
> -------------------------------------------------------------------
>
>                 Key: IGNITE-20484
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20484
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> *Motivation*
> It happens that when the request is created, the primary replica is in this 
> node, but when the request is executed in the replica, it has already lost 
> its role.
> {noformat}
> [2023-09-25T11:03:24,408][WARN 
> ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to 
> process replica request [request=ReadWriteSingleRowReplicaRequestImpl 
> [binaryRowMessage=BinaryRowMessageImpl 
> [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], 
> commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], 
> full=true, groupId=4_part_0, requestType=RW_UPSERT, term=111124742070009862, 
> timestampLong=111124742430588928, 
> transactionId=018acb5d-4e54-0006-0000-0000705db0b1]]
>  java.util.concurrent.CompletionException: java.lang.NullPointerException
>     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]
>     at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) 
> ~[?:?]
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169)
>  ~[main/:?]
>     at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) 
> ~[?:?]
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269)
>  ~[main/:?]
>     at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
>  [?:?]
>     at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>     at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415)
>  ~[main/:?]
>     at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
>  ~[?:?]
>     ... 15 more
> {noformat}
> *Definition of done*
> In this case, we should throw the correct exception because the request 
> cannot be handled in this replica anymore, and the matched transaction will 
> be rolled back.
> *Implementation notes*
> Do not forget to check all places where the issue is mentioned (especially in 
> TODO section).
> As discussed with [~sanpwc]:
> This exception is likely to be thrown when 
> - we successfully get a primary replica on one node
> - send a message and the message is slightly slow to be delivered
> - we handle the received message on the recepient node and run 
> {{placementDriver.getPrimaryReplica}}. 
> If the previous lease has expired by the time we handle the message, the call 
> to {{placementDriver}} will result in a {{null}} value instead of a 
> {{ReplicaMeta}} instance. Hence the NPE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing

Reply via email to