[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin reassigned IGNITE-20484: -------------------------------------------- Assignee: Vladislav Pyatkov > NPE when some operation occurs when the primary replica is changing > ------------------------------------------------------------------- > > Key: IGNITE-20484 > URL: https://issues.apache.org/jira/browse/IGNITE-20484 > Project: Ignite > Issue Type: Bug > Reporter: Vladislav Pyatkov > Assignee: Vladislav Pyatkov > Priority: Major > Labels: ignite-3 > > *Motivation* > It happens that when the request is created, the primary replica is in this > node, but when the request is executed in the replica, it has already lost > its role. > {noformat} > [2023-09-25T11:03:24,408][WARN > ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to > process replica request [request=ReadWriteSingleRowReplicaRequestImpl > [binaryRowMessage=BinaryRowMessageImpl > [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], > commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], > full=true, groupId=4_part_0, requestType=RW_UPSERT, term=111124742070009862, > timestampLong=111124742430588928, > transactionId=018acb5d-4e54-0006-0000-0000705db0b1]] > java.util.concurrent.CompletionException: java.lang.NullPointerException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) > ~[main/:?] > at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) > ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) > ~[main/:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) > ~[main/:?] > at > org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) > ~[main/:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) > ~[main/:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) > ~[main/:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) > ~[main/:?] > at > java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) > [?:?] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.lang.NullPointerException > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) > ~[main/:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > ... 15 more > {noformat} > *Definition of done* > In this case, we should throw the correct exception because the request > cannot be handled in this replica anymore, and the matched transaction will > be rolled back. > *Implementation notes* > Do not forget to check all places where the issue is mentioned (especially in > TODO section). > As discussed with [~sanpwc]: > This exception is likely to be thrown when > - we successfully get a primary replica on one node > - send a message and the message is slightly slow to be delivered > - we handle the received message on the recepient node and run > {{placementDriver.getPrimaryReplica}}. > If the previous lease has expired by the time we handle the message, the call > to {{placementDriver}} will result in a {{null}} value instead of a > {{ReplicaMeta}} instance. Hence the NPE. -- This message was sent by Atlassian Jira (v8.20.10#820010)