I did use Boron-SR3 and managed to hit this issue after a series of restarts. I will enable debug and file bug after getting more info.
In any case, it seems odd that we hit a NPE. On Fri, Apr 7, 2017 at 8:22 AM, Tom Pantelis <tompante...@gmail.com> wrote: > > > On Wed, Apr 5, 2017 at 3:18 PM, Srini Seetharaman < > srini.seethara...@gmail.com> wrote: > >> During this time, the shard leadership was in some weird state. I am not >> sure what "PreLeader" means. Can you clarify? >> >> > PreLeader is an intermediate state prior to Leader to apply any > uncommitted transactions on leader change prior to accepting new ones. I > would suggest testing with Boron SR3 as it contains several changes/fixes. > I would also suggest running with org.opendaylight. > controller.cluster.datastore.Shard debug enabled so there's a paper trail > in case something goes wrong. > > >> -------------- instance 1 -------------- >> member-1-shard-default-config: Follower >> member-1-shard-default-operational: Follower >> -------------- instance2 -------------- >> member-2-shard-default-config: Follower >> member-2-shard-default-operational: PreLeader >> -------------- instance 3 -------------- >> member-3-shard-default-config: Leader >> member-3-shard-default-operational: Follower >> >> >> On Wed, Apr 5, 2017 at 12:17 PM, Srini Seetharaman < >> srini.seethara...@gmail.com> wrote: >> >>> Here is the code blurb from boron-sr2 from that SnapshotManager.java >>> file: >>> >>> 209 //use the term of the temp-min, since we check for >>> isPresent, entry will not be null >>> 210 ReplicatedLogEntry entry = >>> context.getReplicatedLog().get(tempMin); >>> 211 context.getReplicatedLog().snapshotPreCommit(tempMin, >>> entry.getTerm()); >>> 212 context.getReplicatedLog().snapshotCommit(); >>> 213 return tempMin; >>> 214 } >>> >>> >>> On Wed, Apr 5, 2017 at 12:15 PM, Srini Seetharaman < >>> srini.seethara...@gmail.com> wrote: >>> >>>> >>>> Hi, >>>> During one of my runs of bring up and down the interfaces of cluster >>>> members, I hit the following NPE after all 3 instances were isolated >>>> twice. Let me know if you need any more info besides the log below. >>>> >>>> >>>> 2017-04-05 19:08:51,860 | WARN | lt-dispatcher-28 | >>>> ConcurrentDOMDataBroker | 193 - >>>> org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | Tx: DOM-32 Error during phase CAN_COMMIT, starting >>>> Abort >>>> org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: >>>> Shard member-2-shard-default-operational currently has no leader. Try >>>> again later. >>>> at org.opendaylight.controller.cluster.datastore.shardmanager.S >>>> hardManager.createNoShardLeaderException(ShardManager.java:7 >>>> 23)[193:org.opendaylight.controller.sal-distributed-datastor >>>> e:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.datastore.shardmanager.S >>>> hardManager.onShardNotInitializedTimeout(ShardManager.java:5 >>>> 37)[193:org.opendaylight.controller.sal-distributed-datastor >>>> e:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.datastore.shardmanager.S >>>> hardManager.handleCommand(ShardManager.java:216)[193:org.ope >>>> ndaylight.controller.sal-distributed-datastore:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.common.actor.AbstractUnt >>>> ypedPersistentActor.onReceiveCommand(AbstractUntypedPersiste >>>> ntActor.java:29)[187:org.opendaylight.controller.sal-cluster >>>> ing-commons:1.4.2.Boron-SR2] >>>> at akka.persistence.UntypedPersistentActor.onReceive(Persistent >>>> Actor.scala:170)[181:com.typesafe.akka.persistence:2.4.7] >>>> at org.opendaylight.controller.cluster.common.actor.MeteringBeh >>>> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight.c >>>> ontroller.sal-clustering-commons:1.4.2.Boron-SR2] >>>> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell >>>> .scala:544)[175:com.typesafe.akka.actor:2.4.7] >>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175:co >>>> m.typesafe.akka.actor:2.4.7] >>>> at akka.persistence.UntypedPersistentActor.akka$persistence$Eve >>>> ntsourced$$super$aroundReceive(PersistentActor.scala:168)[18 >>>> 1:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour >>>> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc >>>> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.UntypedPersistentActor.aroundReceive(Persis >>>> tentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[175 >>>> :com.typesafe.akka.actor:2.4.7] >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ >>>> esafe.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175: >>>> com.typesafe.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf >>>> e.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa >>>> fe.akka.actor:2.4.7] >>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.j >>>> ava:260)[171:org.scala-lang.scala-library:2.11.8.v20160304-1 >>>> 15712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(For >>>> kJoinPool.java:1339)[171:org.scala-lang.scala-library:2.11.8 >>>> .v20160304-115712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >>>> l.java:1979)[171:org.scala-lang.scala-library:2.11.8.v201603 >>>> 04-115712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >>>> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11. >>>> 8.v20160304-115712-1706a37eb8] >>>> 2017-04-05 19:08:51,863 | WARN | tor-ComputeTimer | >>>> GenericTransactionUtils | 301 - com.infinera.sdn.utils.transaction >>>> - 0.1.0.SNAPSHOT | Transaction for add of object State [_cpuInfo=CpuInfo >>>> [_processorCount=6, _usage=0.48, augmentation=[]], _memInfo=MemInfo >>>> [_memFree=138797056, >>>> _memTotal=12302811136, augmentation=[]], _status=class >>>> org.opendaylight.yang.gen.v1.urn.infinera.system.compute.rev160510.Running, >>>> augmentation=[]] failed with error {} >>>> 2017-04-05 19:09:14,056 | INFO | lt-dispatcher-35 | >>>> kka://opendaylight-cluster-data) | 176 - com.typesafe.akka.slf4j - >>>> 2.4.7 | Cluster Node [akka.tcp://opendaylight-clust >>>> er-data@172.17.0.12:2550] - Leader is moving node [akka.tcp:// >>>> opendaylight-cluster-data@172.17.0.11:2550] to [Up] >>>> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardManager >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | shard-manager-operational: Received MemberUp: >>>> memberName: MemberName{name=member-1}, address: akka.tcp: >>>> //opendaylight-cluster-data@172.17.0.11:2550 >>>> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardInformation >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >>>> member-1-shard-default-operational with address >>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >>>> hardmanager-operational/member-1-shard-default-operational >>>> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardInformation >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >>>> member-1-shard-entity-ownership-operational with address >>>> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >>>> hardmanager-operational/member-1-shard-entity-ownership-operational >>>> 2017-04-05 19:09:14,058 | INFO | lt-dispatcher-18 | ShardManager >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | shard-manager-config: Received MemberUp: memberName: >>>> MemberName{name=member-1}, address: akka.tcp://opendaylight-cluste >>>> r-data@172.17.0.11:2550 >>>> 2017-04-05 19:09:14,058 | INFO | lt-dispatcher-18 | ShardInformation >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >>>> member-1-shard-default-config with address akka.tcp://opendaylight-cluste >>>> r-data@172.17.0.11:2550/user/shardmanager-config/member-1-sh >>>> ard-default-config >>>> 2017-04-05 19:09:14,068 | INFO | lt-dispatcher-18 | ShardManager >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | shard-manager-config: All Shards are ready - data store >>>> config is ready, available count is 0 >>>> 2017-04-05 19:09:14,068 | INFO | lt-dispatcher-18 | Shard >>>> | 188 - org.opendaylight.controller.sal-akka-raft - >>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-config set >>>> to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >>>> hardmanager-config/member-1-shard-default-config >>>> 2017-04-05 19:09:14,063 | INFO | lt-dispatcher-28 | >>>> EntityOwnershipShard | 188 - >>>> org.opendaylight.controller.sal-akka-raft >>>> - 1.4.2.Boron-SR2 | Peer address for peer >>>> member-1-shard-entity-ownership-operational >>>> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >>>> hardmanager-operational/member-1-shard-entity-ownership-operational >>>> 2017-04-05 19:09:14,070 | INFO | lt-dispatcher-33 | Shard >>>> | 188 - org.opendaylight.controller.sal-akka-raft - >>>> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-operational >>>> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >>>> hardmanager-operational/member-1-shard-default-operational >>>> 22017-04-05 19:11:31,513 | WARN | lt-dispatcher-17 | OneForOneStrategy >>>> | 176 - com.typesafe.akka.slf4j - 2.4.7 | null >>>> 2017-04-05 19:11:31,514 | WARN | lt-dispatcher-18 | ShardManager >>>> | 193 - org.opendaylight.controller.sal-distributed-datastore >>>> - 1.4.2.Boron-SR2 | Supervisor Strategy caught unexpected exception - >>>> resuming >>>> java.lang.NullPointerException >>>> at org.opendaylight.controller.cluster.raft.SnapshotManager$Abs >>>> tractSnapshotState.doTrimLog(SnapshotManager.java:211)[188:o >>>> rg.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.SnapshotManager$Idl >>>> e.trimLog(SnapshotManager.java:293)[188:org.opendaylight.con >>>> troller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.SnapshotManager.tri >>>> mLog(SnapshotManager.java:91)[188:org.opendaylight.controlle >>>> r.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.AbstractR >>>> aftActorBehavior.performSnapshotWithoutCapture(AbstractRaftA >>>> ctorBehavior.java:470)[188:org.opendaylight.controller.sal-a >>>> kka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >>>> eader.purgeInMemoryLog(AbstractLeader.java:400)[188:org.open >>>> daylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >>>> eader.handleAppendEntriesReply(AbstractLeader.java:368)[188: >>>> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.AbstractR >>>> aftActorBehavior.handleMessage(AbstractRaftActorBehavior.java:404)[188: >>>> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >>>> eader.handleMessage(AbstractLeader.java:457)[188:org.openday >>>> light.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.behaviors.PreLeader >>>> .handleMessage(PreLeader.java:49)[188:org.opendaylight.contr >>>> oller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.RaftActor.possiblyH >>>> andleBehaviorMessage(RaftActor.java:302)[188:org.opendayligh >>>> t.controller.sal-akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.raft.RaftActor.handleCom >>>> mand(RaftActor.java:290)[188:org.opendaylight.controller.sal >>>> -akka-raft:1.4.2.Boron-SR2] >>>> at org.opendaylight.controller.cluster.common.actor.AbstractUnt >>>> ypedPersistentActor.onReceiveCommand(AbstractUntypedPersiste >>>> ntActor.java:29)[187:org.opendaylight.controller.sal-cluster >>>> ing-commons:1.4.2.Boron-SR2] >>>> at akka.persistence.UntypedPersistentActor.onReceive(Persistent >>>> Actor.scala:170)[181:com.typesafe.akka.persistence:2.4.7] >>>> at org.opendaylight.controller.cluster.common.actor.MeteringBeh >>>> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight.c >>>> ontroller.sal-clustering-commons:1.4.2.Boron-SR2] >>>> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell >>>> .scala:544)[175:com.typesafe.akka.actor:2.4.7] >>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175:co >>>> m.typesafe.akka.actor:2.4.7] >>>> at akka.persistence.UntypedPersistentActor.akka$persistence$Eve >>>> ntsourced$$super$aroundReceive(PersistentActor.scala:168)[18 >>>> 1:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour >>>> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc >>>> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.persistence.UntypedPersistentActor.aroundReceive(Persis >>>> tentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[175 >>>> :com.typesafe.akka.actor:2.4.7] >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ >>>> esafe.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175: >>>> com.typesafe.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf >>>> e.akka.actor:2.4.7] >>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa >>>> fe.akka.actor:2.4.7] >>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.j >>>> ava:260)[171:org.scala-lang.scala-library:2.11.8.v20160304-1 >>>> 15712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(For >>>> kJoinPool.java:1339)[171:org.scala-lang.scala-library:2.11.8 >>>> .v20160304-115712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >>>> l.java:1979)[171:org.scala-lang.scala-library:2.11.8.v201603 >>>> 04-115712-1706a37eb8] >>>> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >>>> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11. >>>> 8.v20160304-115712-1706a37eb8] >>>> >>>> >>> >> >> _______________________________________________ >> controller-dev mailing list >> controller-dev@lists.opendaylight.org >> https://lists.opendaylight.org/mailman/listinfo/controller-dev >> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev