The isPresent(tempMin) check will return true if the index is in the
snapshot but not in the log. This would cause the call to get(tempMin) to
return null. While we can/should guard against this, it's a bit bothersome
that it hit this case - it would be nice to know why and confirm that it's
a valid edge case and not a symptom of another bug.

On Mon, Apr 10, 2017 at 11:15 AM, Srini Seetharaman <
srini.seethara...@gmail.com> wrote:

> I did use Boron-SR3 and managed to hit this issue after a series of
> restarts. I will enable debug and file bug after getting more info.
>
> In any case, it seems odd that we hit a NPE.
>
> On Fri, Apr 7, 2017 at 8:22 AM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>>
>>
>> On Wed, Apr 5, 2017 at 3:18 PM, Srini Seetharaman <
>> srini.seethara...@gmail.com> wrote:
>>
>>> During this time, the shard leadership was in some weird state. I am not
>>> sure what "PreLeader" means. Can you clarify?
>>>
>>>
>> PreLeader is an intermediate state prior to Leader to apply any
>> uncommitted transactions on leader change prior to accepting new ones. I
>> would suggest testing with Boron SR3 as it contains several changes/fixes.
>> I would also suggest running with 
>> org.opendaylight.controller.cluster.datastore.Shard
>> debug enabled so there's a paper trail in case something goes wrong.
>>
>>
>>> -------------- instance 1 --------------
>>> member-1-shard-default-config: Follower
>>> member-1-shard-default-operational: Follower
>>> -------------- instance2 --------------
>>> member-2-shard-default-config: Follower
>>> member-2-shard-default-operational: PreLeader
>>> -------------- instance 3 --------------
>>> member-3-shard-default-config: Leader
>>> member-3-shard-default-operational: Follower
>>>
>>>
>>> On Wed, Apr 5, 2017 at 12:17 PM, Srini Seetharaman <
>>> srini.seethara...@gmail.com> wrote:
>>>
>>>> Here is the code blurb from boron-sr2 from that SnapshotManager.java
>>>> file:
>>>>
>>>> 209                 //use the term of the temp-min, since we check for
>>>> isPresent, entry will not be null
>>>> 210                 ReplicatedLogEntry entry =
>>>> context.getReplicatedLog().get(tempMin);
>>>> 211                 context.getReplicatedLog().snapshotPreCommit(tempMin,
>>>> entry.getTerm());
>>>> 212                 context.getReplicatedLog().snapshotCommit();
>>>> 213                 return tempMin;
>>>> 214             }
>>>>
>>>>
>>>> On Wed, Apr 5, 2017 at 12:15 PM, Srini Seetharaman <
>>>> srini.seethara...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>> During one of my runs of bring up and down the interfaces of cluster
>>>>> members, I hit the following NPE after all 3 instances were isolated
>>>>> twice.  Let me know if you need any more info besides the log below.
>>>>>
>>>>>
>>>>> 2017-04-05 19:08:51,860 | WARN  | lt-dispatcher-28 |
>>>>> ConcurrentDOMDataBroker          | 193 -
>>>>> org.opendaylight.controller.sal-distributed-datastore -
>>>>> 1.4.2.Boron-SR2 | Tx: DOM-32 Error during phase CAN_COMMIT, starting Abort
>>>>> org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException:
>>>>> Shard member-2-shard-default-operational currently has no leader. Try
>>>>> again later.
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.datastore.shardmanager.ShardManager.createNoShardLeade
>>>>> rException(ShardManager.java:723)[193:org.opendaylight.contr
>>>>> oller.sal-distributed-datastore:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.datastore.shardmanager.ShardManager.onShardNotInitiali
>>>>> zedTimeout(ShardManager.java:537)[193:org.opendaylight.contr
>>>>> oller.sal-distributed-datastore:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.datastore.shardmanager.ShardManager.handleCommand(Shar
>>>>> dManager.java:216)[193:org.opendaylight.controller.sal-distr
>>>>> ibuted-datastore:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.common.actor.AbstractUntypedPersistentActor.onReceiveC
>>>>> ommand(AbstractUntypedPersistentActor.java:29)[187:org.opend
>>>>> aylight.controller.sal-clustering-commons:1.4.2.Boron-SR2]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.onReceive(PersistentActor.scala:170)[181:com.types
>>>>> afe.akka.persistence:2.4.7]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.common.actor.MeteringBehavior.apply(MeteringBehavior.j
>>>>> ava:97)[187:org.opendaylight.controller.sal-clustering-commo
>>>>> ns:1.4.2.Boron-SR2]
>>>>>         at akka.actor.ActorCell$$anonfun$
>>>>> become$1.applyOrElse(ActorCell.scala:544)[175:com.typesafe.a
>>>>> kka.actor:2.4.7]
>>>>>         at akka.actor.Actor$class.aroundR
>>>>> eceive(Actor.scala:484)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.akka$persistence$Eventsourced$$super$aroundReceive
>>>>> (PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>>>>         at akka.persistence.Eventsourced$
>>>>> $anon$1.stateReceive(Eventsourced.scala:633)[181:com.typesaf
>>>>> e.akka.persistence:2.4.7]
>>>>>         at akka.persistence.Eventsourced$
>>>>> class.aroundReceive(Eventsourced.scala:179)[181:com.typesafe
>>>>> .akka.persistence:2.4.7]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.aroundReceive(PersistentActor.scala:168)[181:com.t
>>>>> ypesafe.akka.persistence:2.4.7]
>>>>>         at akka.actor.ActorCell.receiveMe
>>>>> ssage(ActorCell.scala:526)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.actor.ActorCell.invoke(Ac
>>>>> torCell.scala:495)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.processM
>>>>> ailbox(Mailbox.scala:257)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.run(Mail
>>>>> box.scala:224)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.exec(Mai
>>>>> lbox.scala:234)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinTask.doExec(ForkJoinTask.java:260)[171:org.scala-lang.sc
>>>>> ala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[171:org.s
>>>>> cala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinPool.runWorker(ForkJoinPool.java:1979)[171:org.scala-lan
>>>>> g.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinWorkerThread.run(ForkJoinWorkerThread.java:107)[171:org.
>>>>> scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>> 2017-04-05 19:08:51,863 | WARN  | tor-ComputeTimer |
>>>>> GenericTransactionUtils          | 301 - 
>>>>> com.infinera.sdn.utils.transaction
>>>>> - 0.1.0.SNAPSHOT | Transaction for add of object State [_cpuInfo=CpuInfo
>>>>> [_processorCount=6, _usage=0.48, augmentation=[]], _memInfo=MemInfo 
>>>>> [_memFree=138797056,
>>>>> _memTotal=12302811136, augmentation=[]], _status=class
>>>>> org.opendaylight.yang.gen.v1.urn.infinera.system.compute.rev160510.Running,
>>>>> augmentation=[]] failed with error {}
>>>>> 2017-04-05 19:09:14,056 | INFO  | lt-dispatcher-35 |
>>>>> kka://opendaylight-cluster-data) | 176 - com.typesafe.akka.slf4j -
>>>>> 2.4.7 | Cluster Node [akka.tcp://opendaylight-clust
>>>>> er-data@172.17.0.12:2550] - Leader is moving node [akka.tcp://
>>>>> opendaylight-cluster-data@172.17.0.11:2550] to [Up]
>>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardManager
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | shard-manager-operational: Received MemberUp:
>>>>> memberName: MemberName{name=member-1}, address: akka.tcp:
>>>>> //opendaylight-cluster-data@172.17.0.11:2550
>>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardInformation
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>>> member-1-shard-default-operational with address
>>>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>>> hardmanager-operational/member-1-shard-default-operational
>>>>> 2017-04-05 19:09:14,057 | INFO  | lt-dispatcher-35 | ShardInformation
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>>> member-1-shard-entity-ownership-operational with address
>>>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>>> hardmanager-operational/member-1-shard-entity-ownership-operational
>>>>> 2017-04-05 19:09:14,058 | INFO  | lt-dispatcher-18 | ShardManager
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | shard-manager-config: Received MemberUp: memberName:
>>>>> MemberName{name=member-1}, address: akka.tcp://opendaylight-cluste
>>>>> r-data@172.17.0.11:2550
>>>>> 2017-04-05 19:09:14,058 | INFO  | lt-dispatcher-18 | ShardInformation
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer
>>>>> member-1-shard-default-config with address akka.tcp://opendaylight-cluste
>>>>> r-data@172.17.0.11:2550/user/shardmanager-config/member-1-sh
>>>>> ard-default-config
>>>>> 2017-04-05 19:09:14,068 | INFO  | lt-dispatcher-18 | ShardManager
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | shard-manager-config: All Shards are ready - data 
>>>>> store
>>>>> config is ready, available count is 0
>>>>> 2017-04-05 19:09:14,068 | INFO  | lt-dispatcher-18 | Shard
>>>>>                | 188 - org.opendaylight.controller.sal-akka-raft -
>>>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-config set
>>>>> to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>>> hardmanager-config/member-1-shard-default-config
>>>>> 2017-04-05 19:09:14,063 | INFO  | lt-dispatcher-28 |
>>>>> EntityOwnershipShard             | 188 -
>>>>> org.opendaylight.controller.sal-akka-raft - 1.4.2.Boron-SR2 | Peer
>>>>> address for peer member-1-shard-entity-ownership-operational set to
>>>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>>> hardmanager-operational/member-1-shard-entity-ownership-operational
>>>>> 2017-04-05 19:09:14,070 | INFO  | lt-dispatcher-33 | Shard
>>>>>                | 188 - org.opendaylight.controller.sal-akka-raft -
>>>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-operational
>>>>> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s
>>>>> hardmanager-operational/member-1-shard-default-operational
>>>>> 22017-04-05 19:11:31,513 | WARN  | lt-dispatcher-17 |
>>>>> OneForOneStrategy                | 176 - com.typesafe.akka.slf4j - 2.4.7 |
>>>>> null
>>>>> 2017-04-05 19:11:31,514 | WARN  | lt-dispatcher-18 | ShardManager
>>>>>                 | 193 - 
>>>>> org.opendaylight.controller.sal-distributed-datastore
>>>>> - 1.4.2.Boron-SR2 | Supervisor Strategy caught unexpected exception -
>>>>> resuming
>>>>> java.lang.NullPointerException
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.SnapshotManager$AbstractSnapshotState.doTrimLog(S
>>>>> napshotManager.java:211)[188:org.opendaylight.controller.sal
>>>>> -akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.SnapshotManager$Idle.trimLog(SnapshotManager.java
>>>>> :293)[188:org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.SnapshotManager.trimLog(SnapshotManager.java:91)[
>>>>> 188:org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.AbstractRaftActorBehavior.performSnapsh
>>>>> otWithoutCapture(AbstractRaftActorBehavior.java:470)[188:org
>>>>> .opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.AbstractLeader.purgeInMemoryLog(Abstrac
>>>>> tLeader.java:400)[188:org.opendaylight.controller.sal-akka-r
>>>>> aft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.AbstractLeader.handleAppendEntriesReply
>>>>> (AbstractLeader.java:368)[188:org.opendaylight.controller.sa
>>>>> l-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.AbstractRaftActorBehavior.handleMessage
>>>>> (AbstractRaftActorBehavior.java:404)[188:org.opendaylight.co
>>>>> ntroller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.AbstractLeader.handleMessage(AbstractLe
>>>>> ader.java:457)[188:org.opendaylight.controller.sal-akka-raft
>>>>> :1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.behaviors.PreLeader.handleMessage(PreLeader.java:
>>>>> 49)[188:org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.RaftActor.possiblyHandleBehaviorMessage(RaftActor
>>>>> .java:302)[188:org.opendaylight.controller.sal-akka-raft:1.4
>>>>> .2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.raft.RaftActor.handleCommand(RaftActor.java:290)[188:o
>>>>> rg.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.common.actor.AbstractUntypedPersistentActor.onReceiveC
>>>>> ommand(AbstractUntypedPersistentActor.java:29)[187:org.opend
>>>>> aylight.controller.sal-clustering-commons:1.4.2.Boron-SR2]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.onReceive(PersistentActor.scala:170)[181:com.types
>>>>> afe.akka.persistence:2.4.7]
>>>>>         at org.opendaylight.controller.cl
>>>>> uster.common.actor.MeteringBehavior.apply(MeteringBehavior.j
>>>>> ava:97)[187:org.opendaylight.controller.sal-clustering-commo
>>>>> ns:1.4.2.Boron-SR2]
>>>>>         at akka.actor.ActorCell$$anonfun$
>>>>> become$1.applyOrElse(ActorCell.scala:544)[175:com.typesafe.a
>>>>> kka.actor:2.4.7]
>>>>>         at akka.actor.Actor$class.aroundR
>>>>> eceive(Actor.scala:484)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.akka$persistence$Eventsourced$$super$aroundReceive
>>>>> (PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
>>>>>         at akka.persistence.Eventsourced$
>>>>> $anon$1.stateReceive(Eventsourced.scala:633)[181:com.typesaf
>>>>> e.akka.persistence:2.4.7]
>>>>>         at akka.persistence.Eventsourced$
>>>>> class.aroundReceive(Eventsourced.scala:179)[181:com.typesafe
>>>>> .akka.persistence:2.4.7]
>>>>>         at akka.persistence.UntypedPersis
>>>>> tentActor.aroundReceive(PersistentActor.scala:168)[181:com.t
>>>>> ypesafe.akka.persistence:2.4.7]
>>>>>         at akka.actor.ActorCell.receiveMe
>>>>> ssage(ActorCell.scala:526)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.actor.ActorCell.invoke(Ac
>>>>> torCell.scala:495)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.processM
>>>>> ailbox(Mailbox.scala:257)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.run(Mail
>>>>> box.scala:224)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at akka.dispatch.Mailbox.exec(Mai
>>>>> lbox.scala:234)[175:com.typesafe.akka.actor:2.4.7]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinTask.doExec(ForkJoinTask.java:260)[171:org.scala-lang.sc
>>>>> ala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[171:org.s
>>>>> cala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinPool.runWorker(ForkJoinPool.java:1979)[171:org.scala-lan
>>>>> g.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>         at scala.concurrent.forkjoin.Fork
>>>>> JoinWorkerThread.run(ForkJoinWorkerThread.java:107)[171:org.
>>>>> scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> controller-dev mailing list
>>> controller-dev@lists.opendaylight.org
>>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>>
>>>
>>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to